AD2}257  643 

iiiuiii 


4 


SYSTEMS  OPTIMIZATION  LABORATORY 
DEPARTMENT  OF  OPERATIONS  RESEARCH 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA  94305-4022 


Large-Scale  Sequential  Quadratic 
Programming  Algorithms 

by 

Samuel  K.  Eldersveld 


TECHNICAL  REPORT  SOL  92-4 
September  1992 


DT1C 

ELECTE 
NOVO  9 1992 

E 


92-29086 

11  n  K  n  n  q 

Research  and  reproduction  of  this  report  were  partially  supported  by  the  National  Science 
Foundation  Grant  DDM-9204208,  the  Department  of  Energy  Grant  DE-FG03-92ER25117, 
Office  of  Naval  Research  Grant  N00014-90-J-1242,  and  an  IBM  Graduate  Technical  Fellow¬ 
ship. 

Any  opinions,  findings,  and  conclusions  or  recommendations  expressed  in  this  publication 
are  those  of  the  author  and  do  NOT  necessarily  reflect  the  views  of  the  above  sponsors. 

Also  listed  as  Operations  Research  Department  Technical  Report  92-11.  Reproduction  in 
whole  or  in  part  is  permitted  for  any  purposes  of  the  United  States  Government.  This 
document  has  been  approved  for  public  release  and  sale;  its  distribution  is  unlimited. 


mm 


tyO  %76‘? 


\<)( 


LARGE-SCALE 

SEQUENTIAL  QUADRATIC  PROGRAMMING  ALGORITHMS 


Samuel  Keith  Eldersveld,  Ph.D. 
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Abstract 

The  problem  addressed  is  the  general  nonlinear  programming  problem:  finding  a  local 
minimizer  for  a  nonlinear  function  subject  to  a  mixture  of  nonlinear  equality  and  inequal¬ 
ity  constraints.  The  methods  studied  are  in  the  class  of  sequential  quadratic  programming 
(SQP)  algorithms,  which  have  previously  proved  successful  for  problems  of  moderate  size. 
Our  goal  is  to  devise  an  SQP  algorithm  that  is  applicable  to  large-scale  optimization  prob¬ 
lems,  using  sparse  data  structures  and  storing  less  curvature  information  but  maintaining 
the  property  of  superlinear  convergence.  The  main  features  are: 

1.  The  use  of  a  quasi- Newton  approximation  to  the  reduced  Hessian  of  the  Lagrangian 
function.  Only  an  estimate  of  the  reduced  Hessian  matrix  is  required  by  our  algo¬ 
rithm.  The  impact  of  not  having  available  the  full  Hessian  approximation  is  studied 
and  alternative  estimates  are  constructed. 

2.  The  use  of  a  transformation  matrix  Q.  This  allows  the  QP  gradient  to  be  computed 
easily  when  only  the  reduced  Hessian  approximation  is  maintained. 

3.  The  use  of  a  reduced-gradient  form  of  the  basis  for  the  null  space  of  the  working 
set.  This  choice  of  basis  is  more  practical  than  an  orthogonal  null-space  basis  for 
large-scale  problems.  The  continuity  condition  for  this  choice  is  proven. 

4.  The  use  of  incomplete  solutions  of  quadratic  programming  subproblems.  Certain 
iterates  generated  by  an  active-set  method  for  the  QP  subproblem  are  used  in  place 
of  the  QP  minimizer  to  define  the  search  direction  for  the  nonlinear  problem. 

An  implementation  of  the  new  algorithm  has  been  obtained  by  modifying  the  code 
MINOS.  Results  and  comparisons  with  MINOS  and  NPSOL  are  given  for  the  new  algorithm 
on  a.  set  of  92  test  problems. 
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Chapter  1 

Introduction 


The  problem  addressed  in  this  report  is  that  of  finding  a  local  minimizer  for  a  general 
nonlinear  function  F(x )  subject  to  a  set  of  nonlinear  constraints  c(x)  >  0.  This  is  the 
general  nonlinear  programming  problem  (NLP): 


minimize 

F(x) 

r€*n 

S.t. 

c(x)  >  0, 

where  F  :  —  #  and  c  :  -  »m. 

There  are  a  number  of  mathematically  equivalent  forms  of  NLP.  The  relevance  of  the 
precise  form  of  the  problem  to  the  efficiency  of  specific  algorithms  is  discussed  later.  We 
assume  that  the  objective  function  F(x)  and  the  nonlinear  constraint  functions  c,(x),  i  = 
1, . . . ,  m,  are  twice  continuously  differentiable. 

A  wide  variety  of  algorithms  exist  for  solving  NLP,  none  of  which  can  be  considered 
preferable  for  all  problems.  For  a  general  discussion  of  NLP  the  reader  is  referred  to 
Fletcher  [Fle87]  and  Gill  et  al.  [GMW81].  For  a  recent  survey  of  methods  see  [GMSW89]. 

The  particular  focus  of  this  report  is  the  case  of  large,  sparse  NLP.  By  large  and  sparse 
we  mean  that  we  are  concerned  with  the  instances  of  problem  NLP  in  which  n  is  large, 
the  m  x  n  Jacobian  of  the  nonlinear  constraints  is  sparse,  and  usually  n  —  m  <  n.  Al¬ 
though  many  algorithms  have  been  proposed  to  solve  NLP,  few  have  been  adapted  for  the 
large  sparse  case.  A  notable  exception  is  the  Lagrangian  method  of  Murtagh  and  Saun¬ 
ders  [MurS82].  This  algorithm  has  been  implemented  as  the  mathematical  programming 
system  MINOS  [MurS87]. 

There  is  a  concensus  that  the  best  methods  for  solving  NLP  when  n  is  small  are  so- 
called  sequential  quadratic  programming  (SQP)  methods.  Such  methods  make  use  of  local 
curvature  information  to  construct  a  quadratic  programming  (QP)  model  or  subproblem  of 
NLP.  A  local  minimizer  is  found  by  solving  a  sequence  of  these  QP  subproblems.  The  rate 
of  convergence  of  SQP  methods  is  usually  superiinear  under  certain  assumptions  on  the 
closeness  of  the  quadratic  approximation.  We  are  concerned  with  developing  large-scale 
SQP  algorithms  that  are  globally  convergent  to  a  local  minimizer  of  NLP,  and  have  a  fast 
rate  of  convergence. 

All  methods  for  solving  NLP  are  iterative.  In  the  case  when  n  is  small  the  efficiency 
of  an  algorithm  is  usually  measured  in  terms  of  the  number  of  iterations,  or  possibly  the 


1 


2 


Large-Scale  SQP  Algorithms 


number  of  function  evaluations,  required  to  attain  some  specified  approximation  to  the 
solution.  In  the  large  sparse  case  we  also  need  to  be  concerned  with  the  effort  required 
to  compute  the  iterates.  It  is  sometimes  worthwhile  to  modify  the  definition  of  the  iter¬ 
ative  sequence  in  order  to  compute  the  iterates  more  efficiently.  We  may  then  take  more 
iterations,  but  the  savings  in  effort  to  compute  the  iterates  is  sufficient  compensation. 

1.1.  Notation  and  definitions 

Our  notation  in  this  report  follows  that  used  in  [GMSW86b]  and  [Pri89].  In  addition  to 
F,  x  and  c  defined  above  we  shall  use  the  following  definitions  and  conventions: 

•  Subscripts  on  a  function  denote  the  value  of  the  function  evaluated  at  the  variable 
with  the  same  subscript  (for  example,  F^.  =  F(x *)). 

•  Bars  on  functions  or  variables  or  data  will  often  be  used  to  denote  updated  quanti¬ 
ties  (for  example,  when  £/t+i  corresponds  to  the  new  iterate,  the  new  value  of  the 
constraints  is  denoted  c  =  c(i;fe+i)). 

•  A  is  the  vector  of  Lagrange  multiplier  estimates  for  c. 

•  L(x,  A)  =  F(x)  -  \Tc(x)  is  the  Lagrangian  function. 

•  g(x)  =  VF(x)  is  the  n  x  1  gradient  vector  for  F. 

•  J{x)  is  the  m  x  n  matrix  of  gradients  for  the  constraint  functions  (the  Jacobian). 
Then  Jl3  =  dcl/dx}. 

•  A(x)  =  (  J(x)  -/  )  is  the  constraint  matrix  for  QP  subproblems.  Often  we  will 
refer  to  a  partition  of  A  as  in  A  =  (  B  S  N  ) ,  where  B  is  nonsingular. 

•  We  will  often  refer  to  a  partitioned  vector  as  in 

l  ) 

P={  Pb .  Ps>  Pn  )  =  Ps  • 

\P»  ) 

In  this  notation  the  commas  denote  that  the  partitioned  vector  has  a  column  dimen¬ 
sion  of  one. 

•  Z  denotes  a  basis  for  the  null  space  of  a  matrix  of  the  form  A  = 

•  Y  denotes  a  matrix  such  that  (  Z  1"'  )  is  nonsingular. 

•  Q  is  a  transformation  matrix  of  the  form  Q(x)  =  (  Z(x)  T(x)  ). 

•  G(x)  denotes  the  Hessian  of  F(x).  Then  G,j  =  d2 F/diidxj. 

•  Gi(x)  denotes  the  Hessian  of  Ci(x). 


1.2  Optimality  conditions  for  NLP 


3 


•  W^x.A)  =  G(x)  -  £  A iGi(i)  is  the  Hessian  of  the  Lagrangian  function. 

•  H  is  an  n  x  n  approximation  to  I'V ( a: ,  A ) . 

•  s  is  the  m-vector  of  slack  variables  for  constraints  c  such  that  c(x)  -  s  =  0. 

•  p  is  the  m-vector  of  QP  multipliers  (for  A). 

•  £  =  p  -  A  is  the  search  direction  for  Lagrange  multiplier  estimates. 

•  p  is  the  search  direction  for  x. 

•  q  =  (c  -  s)  +  Ap  is  the  search  direction  for  slack  variables. 

•  x*  is  a  solution  of  the  constrained  optimization  problem  NLP. 

•  A*  is  the  vector  of  Lagrange  multipliers  at  x*. 

The  above  notation  will  be  defined  again  when  the  terms  involved  are  first  introduced  in 
the  text.  This  list  is  intended  to  be  a  convenient  reference  to  save  searching  for  definitions 
in  the  text. 

1.2.  Optimality  conditions  for  NLP 

A  point  x*  is  a  weak  local  minimizer  of  NLP  if  c( x* )  >  0  and  there  exists  a  f>  >  0  such 
that  F(x)  >  F(x*)  for  all  x  satisfying 

||.r  -  .r*||  <  b  and  c(.r)  >  0.  (1.2.1) 

If  F(x)  >  F(x*)  for  all  .r  f  x*  satisfying  ( 1.2.1 ),  x*  is  defined  as  a  strong  local  minimizer. 

The  above  definition  of  a  local  minimizer  is  of  little  aid  in  determining  x*  or  verifying 
that  some  given  point  is  indeed  a  minimizer.  Most  optimization  methods  (and  all  of  the 
algorithms  described  in  this  report)  determine  x*  by  seeking  points  that  satisfy  verifiable 
optimality  conditions  on  x*.  These  conditions  are  characterized  by  the  first  and  second 
derivatives  of  the  Lagrangian  function  (see  for  example  [FiaM68]).  Assuming  that  the 
Jacobian  of  active  constraints  at  a  solution  to  NLP  has  full  rank,  we  now  give  optimality 
conditions  for  NLP. 

Necessary  conditions  for  x*  to  be  a  local  minimizer  are  that  there  exist  multipliers  A* 
such  that  (x*,\*)  satisfy  the  Iuirush-Kuhn- Tucker  ( KKT )  second- order  conditions: 


c(x* )  > 

0; 

(1.2.2) 

VF(.r*)  = 

a*t\*\ 

(1.2.3) 

Al 

•* 

0; 

(1.2.4) 

w*z* 

is  positive  semi-definite, 

(1.2.5) 

where  IV*  -  W(x*,\*)  is  the  Hessian  of  the  Lagrangian,  A*  is  the  Jacobian  of  the  active 
constraints  at  x*  and  Z*  is  a  basis  for  the  nullspace  of  A* .  Sufficient  conditions  for  ( x* ,  A*) 
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to  be  a  local  minimizer  are  that  there  exist  multipliers  A*  such  that  (a;*,  A*)  satisfy  the 
KKT  conditions  ( 1 .2.2)— ( 1.2.3)  and 

A*  >  0;  (1.2.6) 

Z*T\V* Z*  is  positive  definite.  (1-2.7) 

A  point  x*  with  multipliers  A*  satisfying  only  KKT  conditions  (1.2.2)-(1.2.4)  will  be 
referred  to  as  a  first-order  KKT  point,  while  a  point  satisfying  only  KKT  conditions 
( 1.2.2)— ( 1.2.3)  will  be  referred  to  as  a  constrained  stationary  point.  It  should  be  noted 
that  the  algorithms  developed  for  NLP  in  this  report  do  not  require  the  provision  of 
analytical  second  derivatives.  As  a  result,  the  algorithms  presented  in  Section  2  only 
guarantee  that  a  computed  solution  x*  is  a  first-order  KKT  point.  Despite  this  theoretical 
restriction  on  all  algorithms  that  do  not  evaluate  second  derivatives,  it  is  important  that 
such  algorithms  still  attempt  to  seek  a  minimizer  and  not  simply  a  first-order  KKT  point. 

Consider  the  case  of  unconstrained  minimization.  We  could  simply  generate  iterates 
that  reduce  gTg.  In  so  doing  we  would  converge  to  a  first-order  KKT  point  that  could 
be  either  a  maximizer,  a  saddle-point  or  a  minimizer.  If  on  the  other  hand  we  generate 
iterates  that  reduce  F(x),  we  could  still  only  be  assured  of  finding  a  stationary  point  (i.e. 
g(x)  =  0)  but  it  is  more  likely  to  be  a  minimizer.  The  generalization  of  this  idea  for  a 
constrained  problem  involves  choosing  a  suitable  merit  function  (see  Section  1.3). 

It  is  also  important  to  note  that  without  a  strong  assumption  on  the  form  of  the 
problem  it  is  not  possible  to  distinguish  between  local  and  global  solutions. 

1.3.  SQP  algorithms 

It  is  not  possible  in  general  to  determine  an  optimal  solution  to  NLP  in  a  finite  number  of 
iterations,  except  in  special  cases  such  as  linear  and  quadratic  programs.  SQP  algorithms 
construct  a  sequence  wh°se  limit  points  are  KKT  points.  Given  a  point  x*,  we 

may  obtain  a  new  point  x^+i  by  solving  a  mathematical  programming  problem  whose 
solution  .Tjt+i  approximates  x* .  One  method  of  approximating  the  optimal  step  (x*  -  x*) 
is  to  find  a  minimizer  of  a  local  approximation  to  the  problem.  For  SQP  methods  the 
model  problem  takes  the  form  of  a  QP  subproblem  in  which  a  quadratic  approximation  is 
made  to  the  Lagrangian  and  linear  approximations  are  made  to  the  nonlinear  constraints. 
For  each  x k  in  the  sequence,  the  fc-th  QP  subproblem  may  be  stated  as  follows: 

QP 

where  (jk  =  VF(xk)  and  Ilk  is  an  approximation  to  VF(x*,Ajt),  the  Hessian  of  the  La¬ 
grangian.  The  solution  and  Lagrange  multipliers  are  denoted  by  the  pair  (p*,/r*)- 

Ideally,  we  hope  to  accept  x*  +  Pk  as  the  next  iterate,  especially  near  the  solution. 
However,  the  QP  subproblem  is  defined  only  by  local  information  (i.e.  at  the  current 
iterate);  the  solution  of  the  model  problem  may  be  a  poor  approximation  of  the  solution 
to  NLP  when  the  current  iterate  is  not  close  to  x* .  Hence,  we  regard  the  solution  to  QP  as 
a  search  directum  pk  that  will  be  a  descent  direction  for  some  merit  function,  as  discussed 
next. 
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1.3.1.  Merit  functions 

In  contrast  to  algorithms  for  unconstrained  and  linearly  constrained  problems,  it  is  not 
practical  in  general  to  generate  a  sequence  {xt}£j_0  such  that  all  points  x*  are  feasible. 
Given  two  feasible  points  we  can  determine  which  is  best  by  comparing  F(x)  evaluated  at 
the  two  points.  Once  the  points  are  allowed  to  be  infeasible  it  becomes  problematical  to 
determine  which  is  best.  To  illustrate  this  difficulty,  consider  Figure  1.  Some  of  the  five 
points  Xj , . . . , X5  are  infeasible  with  respect  to  the  single  inequality  constraint  c(x)  >  0  and 
it  is  not  clear  which  offers  the  best  approximation  to  x*.  Because  the  sequence  {x*}£L0 


in  our  algorithms  may  contain  infeasible  points  it  is  necessary  to  order  the  iterates  in 
some  way  other  than  simply  noting  which  iterate  has  the  lowest  function  value.  Many 
algorithms  for  NLP  do  this  by  means  of  a  merit  function ,  which  is  used  to  determine  a  step 
at;  along  a  search  direction  p* .  The  requirement  is  that  the  new  point  x*  +  QkPk  reduce 
the  merit  function  by  a  sufficient  amount.  The  points  x*  therefore  form  an  “improving’’ 
sequence. 

One  possible  choice  for  a  merit  function  is  the  augmented  Lagrangian  merit  function 
due  to  Rockafellar  [Roc 73]: 

M(x,X,p)  =  F(x)  -  XTc  +  7/>||c[|2,  (1.3.1) 

where  p  >  0  is  a  penalty  parameter  and  the  vector  c  is  defined  as 


c,(x) 


c,(.r)  if  c;(x)  -  Xi/p  <  0. 
X  i/p  otherwise. 


(1.3.2) 


Note  that  this  merit  function  assigns  a  positive  penalty  for  increasing  constraint  violations. 
To  illustrate  the  use  of  (1.3.1),  consider  the  following  example: 
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minimize  Fix)  =  X\x\ 

x€  Jf2 

s.t.  c(x)  =  2  -  x?  -  x\  >  0. 


The  optimal  solution  is  x*  =  (-0.81650,-1.1547)  with  optimal  Lagrange  multiplier  A*  = 
0.81650.  Figure  1  depicts  the  contours  of  F(x)  with  c(x)  =  0  superimposed  for  this 
example. 

Figure  2  depicts  the  contours  of  the  augmented  Lagrangian  merit  function  M(x,  A*,p) 
for  the  same  problem  as  in  Figure  1,  using  A*  =  0.81650  and  p  =  0.1  and  p  —  1.  Figure  3 


Figure  2.  Contours  of  Ai(x,  A*,/?)  for  p  —  0.1  and  p  —  1 


Figure  3.  Contours  of  M(x ,  A *,/>)  for  p  =  10  and  p  =  100 


shows  the  contours  of  the  augmented  Lagrangian  function  with  p  =  10  and  p  —  100. 
Figures  2  and  3  demonstrate  the  complications  that  may  arise  in  choosing  the  penalty 
parameter  p.  If  p  is  set  too  small  as  in  the  left  part  of  Figure  2,  the  merit  function  may 
become  unbounded  below.  If  p  is  set  too  large  as  in  the  right  part  in  Figure  3.  the  merit 
function  may  become  ill-conditioned. 


1.4  Historical  background 


The  steps  required  by  an  SQP  method  are  summarized  in  Algorithm  1.3.1  below.  Each 
step  will  be  discussed  in  detail  in  Section  2. 


Algorithm  1.3.1.  (Model  SQP  algorithm) 

Start  with  estimates  xo  and  Ho  of  a  solution  and  the  Hessian  of  the  Lagrangian 
at  xq. 

while  not  converged  do 

Set  up  and  solve  a  QP  subproblem  to  obtain  a  search  direction  p^  and  La¬ 
grange  multipliers  /i*. 

Compute  a  steplength  a  to  reduce  some  merit  function. 

Update  x  according  to  Xk+i  +  apic- 

Evaluate  constraints  c  and  gradients  g  and  J  at  x^+i . 

Update  (or  form )  Hk+i,  the  QP  Hessian  to  be  used  in  the  next  subproblem. 

end  do 


Figure  4.  Model  SQP  algorithm 


1.4.  Historical  background 

SQP  methods  for  NLP  were  first  introduced  in  1963.  Here  we  outline  the  development 
of  SQP  methods  since  then  and  emphasize  some  of  the  key  ideas.  For  a  more  detailed 
discussion  of  the  history  of  SQP  methods,  see  [GMW81,  Pow83,  GMSVV88]. 

1.4.1.  Early  SQP  methods 

Wilson  [Wil63]  is  believed  to  have  been  the  first  to  propose  an  SQP  algorithm.  In  his 
doctoral  dissertation  he  proposed  solving  convex  nonlinear  programming  problems  using 
a  sequence  of  inequality  constrained  QP’s  in  which  the  QP  objective  was  defined  using 
the  exact  Hessian  of  the  Lagrangian.  Successive  NLP  iterates  were  obtained  as  x  =  x  +  p 
(i.e.  without  the  benefit  of  a  merit  function  and  linesearch). 

In  1969,  Murray  [Mur69]  proposed  an  SQP  algorithm  employing  a  quasi-Newton  ap¬ 
proximation  to  the  Hessian  of  the  Lagrangian.  He  also  introduced  the  important  concept 
of  using  the  QP  solution  to  define  a  search  direction  and  choosing  the  next  iterate  by 
taking  a  step  to  reduce  a  merit  function.  The  use  of  a  quasi-Newton  approximation  and 
a  linesearch  enabled  Wilson’s  convexity  assumption  to  be  relaxed. 

Notable  developments  in  SQP  algorithms  occurred  throughout  the  1970’s.  Biggs 
[Big72]  proposed  an  algorithm  using  an  equality-constrained  subproblem,  and  a  term  for 
the  multiplier  estimate  was  added  to  the  constraints.  Han  [Han76]  established  sufficient 
conditions  for  local  and  superlincar  convergence  of  an  SQP  algorithm  under  the  assump¬ 
tion  that  the  Hessian  of  the  Lagrangian  is  positive  definite  on  the  whole  space.  Powell 
[Pow78a]  used  the  framework  of  Han  to  provide  a  proof  of  superlinear  convergence  under 
additional  assumptions  on  how  well  the  Hessian  of  the  Lagrangian  is  approximated. 
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1.4.2.  Merit  functions 

Much  research  has  been  done  on  the  choice  of  merit  function  for  SQP  iterates.  Murray’s 
pioneering  approach  used  an  f2  merit  function  [Mur69].  Since  then  the  focus  has  been  not 
on  the  use  of  the  merit  function  but  on  its  form.  Han  [Han76]  and  Powell  [Pow78b]  in 
their  SQP  algorithms  proposed  the  use  of  the  merit  function  (also  known  as  an  exact 
penalty  function), 

M(x,p)  =  F(x)  +  p||c||i,  (1.4.1) 

where  p  is  a  nonnegative  penalty  parameter  and  c  contains  only  the  values  of  constraints 
c(x)  considered  to  be  violated  at  x.  A  virtue  of  the  t\  merit  function  over  the  f2  merit 
function  is  that  there  exists  a  bounded  value  of  p  for  which  x*  is  a  minimizer  of  M(x,p). 
This  latter  property  makes  convergence  proofs  relatively  simple.  However,  the  f j  merit 
function  is  nonsmooth  across  constraint  violations.  Maratos  [Mar78]  in  his  doctoral  disser¬ 
tation  demonstrated  that  imposing  linesearch  conditions  using  this  merit  function  could 
impede  the  superlinear  rate  of  convergence.  To  overcome  this  deficiency  SQP  methods 
based  on  the  l\  merit  function  must  depart  from  a  pure  SQP  strategy. 

As  an  alternative,  consider  the  augmented  Lagrangian  merit  function 

M(x,\,p)  =  F(x)  -  XTc  +  |pcTc,  (1.4.2) 

where  c  is  defined  in  (1.3.2).  Fletcher  [Fle70]  first  proposed  the  use  of  this  merit  function 
(but  not  in  the  context  of  an  SQP  algorithm).  In  contrast  to  (1.4.1),  (1.4.2)  is  a  smooth 
function.  However,  it  requires  estimates  of  the  Lagrange  multipliers  A.  In  general,  x* 
is  a  minimizer  of  M(x,\,p)  only  if  A  =  A*.  This  requirement  makes  convergence  proofs 
for  SQP  methods  using  (1.4.2)  somewhat  more  difficult  than  proofs  using  (1.4.1).  Both 
Wright  [Wri76]  and  Schittkowski  [SchS2]  proposed  SQP  algorithms  based  on  this  merit 
function. 

Consider  next  an  augmented  Lagrangian  merit  function  defined  in  terms  of  slack  vari¬ 
ables  s  as  well  as  multipliers  A  and  variables  x: 

M(x,s,\,p)  =  F( x )  -  AT(c  -  s)  +  \p{c  -  s)T{c  -  s).  (1.4.3) 

It  is  no  longer  necessary  to  restrict  the  terms  involving  c(x)  in  (1.4.3)  to  some  subset  of 
the  constraints.  The  merit  function  has  the  same  continuity  properties  as  F(x)  and  c(x). 
Gill  el  al.  [GMSWSGb]  proposed  this  merit  function  in  the  context  of  an  SQP  method. 
They  showed  under  certain  assumptions  that  xu  —  x*  and  A*  — *  A*.  In  his  doctoral 
dissertation,  Prieto  [Pri89]  showed  that  a  finite  value  of  p  suffices.  The  steplength  a  is 
determined  by  performing  a  search  in  the  space  of  x,  $  and  A.  This  merit  function  has 
been  implemented  in  the  nonlinear  programming  code  NPSOL  [GMSW86a]. 

1.4.3.  Use  of  the  reduced  Hessian 

For  moderate-sized  problems,  the  most  successful  SQP  algorithms  to  date  have  used  dense 
approximations  to  W,  the  Hessian  of  the  Lagrangian.  A  key  concept  for  large-scale  opti¬ 
mization  is  the  use  of  an  approximation  to  the  reduced  Hessian  Z^^VZ.  This  is  of  prime 
computational  importance  for  the  following  reasons: 

•  A  dense  approximation  to  all  of  IF  may  require  excessive  storage. 


1.4  Historical  background 


9 


•  Computation  of  exact  second  derivatives  may  not  be  possible  or  may  be  too  expen¬ 
sive. 

•  Even  if  W  can  be  evaluated  cheaply,  computation  of  the  matrix  product  Z^\VZ 
from  Z  and  W  may  be  too  expensive  (unless  Z  has  very  few  columns  or  some 
special  structure). 

Gill  and  Murray  [GilM73,  GHM74,  GilM77]  are  credited  with  the  first  use  of  reduced 
Hessian  approximations  for  linearly  constrained  problems.  Murtagh  and  Saunders  in 
[MurS78,  MurS87]  showed  how  to  apply  this  approach  to  the  large-scale  case.  Wright 
[Wri76]  and  Murray  and  Wright  [MurW78]  proposed  the  use  of  a  quasi-Newton  approx¬ 
imation  to  the  reduced  Hessian  for  nonlinearly  constrained  optimization.  Coleman  and 
Conn  [ColC84]  analyzed  an  SQP  method  that  approximated  the  reduced  Hessian  and 
showed  that  the  method  when  applied  to  equality-constrained  problems  converges  2-step 
superlinearly.  Nocedal  and  Overton  [Noc085],  Coleman  and  Fenyes  [C0IF88],  and  Gur- 
witz  and  Overton  (Gur089]  have  all  proposed  algorithms  in  which  approximations  are 
made  to  either  Z^VZ  or  Z^V. 

1.4.4.  Active-set  methods 

Although  interior-point/barrier  methods  could  be  used  within  an  SQP  method,  we  shall 
restrict  our  interest  to  the  solution  of  the  QP  subproblems  using  active-set  methods.  This 
does  not  preclude  the  use  of  barrier  methods  at  the  outer  level  of  the  SQP  algorithm.  That 
is,  inequality  constraints  could  be  removed  by  a  barrier  transformation  and  the  algorithm 
we  propose  used  to  solve  the  resulting  barrier  subproblem. 

For  an  overview  of  active-set  methods,  see  Gill  et  al.  [GMW81]  or  Fletcher  [Fle87]. 

1.4.5.  Early  termination  of  subproblems 

As  problem  size  grows,  the  number  of  iterations  required  by  an  active-set  method  to  solve 
a  QP  subproblem  to  optimality  may  become  large.  In  a  large-scale  SQP  implementation 
it  is  therefore  desirable  to  impose  a  limit  on  the  number  of  QP  iterations  allowed  to  solve 
the  subproblem. 

Murray  [Mur69]  is  credited  as  the  first  to  suggest  this  early-termination  approach. 
Dembo  and  Tulowitzki  [DemT85]  defined  an  early-termination  rule  for  the  QP  subprob¬ 
lem,  based  on  the  norm  of  the  reduced  gradient  ZTg  in  the  subproblem.  Gurwitz  and  Over- 
ton  [Gur089]  presented  an  implementation  of  an  early-termination  algorithm  in  which  a 
subproblem  is  terminated  at  the  first  stationary  point  (i.e.  ZT(g  +  Hp)  =  0).  In  the 
work  of  Prieto  [Pri89],  global  convergence  was  proved  for  an  SQP  algorithm  in  which  the 
search  direction  is  defined  from  information  available  at  any  stationary  point  encountered 
in  the  solution  of  the  QP  subproblem.  Prieto  also  proved  that  use  of  a  reduced  Hessian 
approximation  in  this  context  provides  a  2-step  superlinear  rate  of  convergence. 

1.4.0.  Large-scale  SQP 

Nickle  and  Tolle  [NicT89]  have  described  a  sparse  SQP  approach  in  which  they  maintain 
an  approximation  to  W,  the  full  Hessian  of  t  he  Lagrangian.  They  sacrifice  satisfying  the 
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quasi-Newton  condition  (see  Section  4.2.2)  in  order  to  define  an  H  with  the  same  sparsity 
pattern  as  W. 

1.5.  Contents  and  subsequent  Chapters 

In  Section  2  a  prototype  SQP  algorithm  for  solving  NLP  is  presented.  Each  subproblem 
uses  an  approximation  to  the  reduced  Hessian  of  the  Lagrangian.  A  discussion  of  active- 
set  methods  for  QP  subproblems  is  also  given.  In  Section  3,  important  computational 
building  blocks  are  developed  in  order  that  the  large-scale  QP  subproblems  arising  in  the 
SQP  method  may  be  solved  efficiently.  Section  4  discusses  quasi-Newton  updates  for  the 
reduced  Hessian  of  the  Lagrangian  and  gives  computational  details.  Section  5  presents 
computational  results  for  both  small  and  large  (i.e.  dense  and  sparse)  test  problems  from 
a  variety  of  applications. 


Chapter  2 

A  Prototype  SQP  Algorithm 


In  this  chapter  we  define  the  main  theoretical  tools  for  solving  large-scale  nonlinear  pro¬ 
grams  and  present  an  algorithm  for  solving  NLP.  The  algorithm  allows  the  use  of  incom¬ 
plete  solutions  from  QP  subproblems. 

2.1.  Large-scale  NLP 

Many  SQP  methods  have  been  proposed  for  solving  NLP.  Most  of  them  perform  algebraic 
operations  that  are  appropriate  for  dense  problems,  but  are  not  practical  for  large  and 
sparse  ones.  For  example,  the  storage  required  by  dense  methods  may  become  excessive 
when  there  are  many  variables.  This  section  gives  a  standard  form  for  large-scale  NLP 
and  the  optimality  conditions  modified  for  this  form. 

2.1.1.  The  form  of  the  nonlinear  problem 

In  methods  for  small  or  dense  nonlinear  programming  problems  both  nonlinear  equality 
and  inequality  constraints  (i.e.  cE(x)  =  0  and  c,(x)  >  0)  are  usually  allowed.  In  large-scale 
optimization  the  precise  form  of  the  problem  is  crucial,  and  it  is  often  computationally 
convenient  to  assume  that  the  problem  is  in  the  so-called  nonlinear  programming  standard 
form  (NLPSF): 


NLPSF 


Each  constraint  Cj(i)  is  associated  with  a  slack  variable  s;  with  upper  and  lower  bounds  on 
its  value  as  shown.  The  bounds  on  each  slack  determine  whether  the  associated  constraint 
is  an  equality  or  inequality.  For  example,  if  Ci(ar)  corresponds  to  an  equality  constraint, 
the  bounds  for  slack  s,  are  zero. 

The  definition  of  the  set  of  constraints  considered  to  be  binding  at  a  point  must  be 
modified  for  the  standard  form.  At  any  point  ar,  the  set  of  active  constraints  will  consist 
of  all  functional  constraints  c(x)  -  s  =  0,  as  well  as  the  set  of  active  bounds  at  (x,s). 
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It  may  appear  that  increasing  the  number  of  variables  for  the  problem  in  this  way  is 
computationally  disadvantageous.  For  some  methods  of  optimization  this  is  true.  How¬ 
ever,  it  will  be  seen  that  this  is  not  the  case  for  the  methods  presented  here. 

It  has  been  demonstrated  since  the  earliest  simplex  codes  that  there  are  advantages 
in  using  a  standard  form  involving  slacks.  In  particular,  the  standard  form  only  requires 
access  to  columns  of  the  Jacobian  (as  opposed  to  columns  and  rows).  Also,  when  the 
working-set  basis  matrix  B  is  factorized  (see  Section  2.4),  the  columns  associated  with 
slacks  introduce  no  extra  nonzero  elements  (i.e.  fill-in)  in  the  LU  factors  of  B.  These  same 
advantages  were  retained  for  sparse  NLP  by  Murtagh  and  Saunders  [MurS78,  MurS82). 

2.1.2.  Optimality  conditions  for  large-scale  NLP 

In  the  discussion  of  optimality  conditions  it  will  be  convenient  to  assume  that  the  slack 
variables  s  are  included  in  the  definition  of  the  variables  x.  That  is  we  augment  the 
variables  x  to  include  the  slacks  s: 


so  that  for  the  NLP  in  standard  form,  x  €  3fn+m.  We  also  modify  the  nonlinear  constraints 
to  include  the  slack  variables.  The  NLPSF  may  then  be  written: 


NLPSF' 


As  discussed  in  Section  1,  the  Karush-Kuhn-Tucker  (KKT)  conditions  describe  local 
solutions  to  NLP.  These  are  characterized  by  conditions  on  the  first  and  second  derivatives 
of  the  Lagrangian.  The  NLPSF'  now  has  bound  constraints  on  x  (including  slacks).  This 
will  slightly  change  the  conditions  while  making  the  first-order  conditions  easier  to  identify. 
Define  g*  =  V  F(x* )  and  A*  to  be  the  Jacobian  of  c(x* )  (that  is,  J  is  a  matrix  of  row  vectors 
each  corresponding  to  c,(x*)T  for  i  =  1 ,...,/,  where  t  is  the  number  of  active  functional 
constraints  at  x*).  Let  Z*  be  a  basis  for  the  null  space  of  A*  (so  that  A*  Z*  =  0).  The 
KKT  necessary  conditions  for  ( x* ,  A* )  to  be  a  first-order  KKT  pair  for  NLP  are 


c(  X*  ) 

= 

0, 

(2.1.1) 

X* 

> 

/, 

(2.1.2) 

* 

X 

< 

u, 

(2.1.3) 

A  A  -  7/  +  <7 

= 

* 

5, 

(2.1.4) 

» ?V  -/) 

= 

0, 

(2.1.5) 

oT(u  —  X *  ) 

= 

0, 

(2.1.6) 

»/ 

> 

0, 

(2.1.7) 

a 

> 

0. 

(2.1.8) 

Let  a*  be  the  j-th  column  of  A* .  Since  the  complementarity  conditions  (2.1.5)  and  (2.1.6) 
enforce  » fj  =  oj  =  0  when  x*  is  not  equal  to  either  of  its  bounds,  we  can  write  the 
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optimality  conditions  (2.1.4)— (2-1.8)  in  what  is  for  us  a  computationally  more  useful  form 
involving  Z*  and  the  explicit  values  of  a  and  ry. 

Z*T(f  =  0,  (2.1.9) 

(uj  -  x*)(g*  -  a*TX*)  >  0  for  j  =  l,...,n,  (2.1.10) 

(x*  -  lj)(g*  -  a*T X*)  <  0  for  j  =  l,...,n.  (2.1.11) 

Optimal  variables  x*  lying  between  their  bounds  will  have  the  corresponding  Lagrange 
multipliers  or  reduced  costs  equal  to  zero: 

°i  =  m  =  9*  ~  af**  =  0.  (2.1.12) 

Optimality  of  the  Lagrange  multipliers  requires  nonnegative  (nonpositive)  reduced  costs 
for  variables  on  lower  (upper)  bounds. 

2.2.  Expansion  of  the  model  algorithm 

Section  1  presented  a  model  algorithm.  The  optimality  procedures  within  Algorithm  1.3.1 
consisted  of  ( 1)  solving  a  QP  subproblem,  (2)  performing  a  linesearch  in  conjunction  with  a 
merit  function,  and  (3)  updating  nonlinear  quantities  including  the  Hessian  approximation 
to  be  used  in  the  next  QP  subproblem.  A  discussion  of  each  of  these  steps  follows. 

2.2.1.  Subproblem  definition 

A  number  of  methods  have  been  proposed  for  solving  large-scale  NLP.  Two  mentioned 
in  this  section  are  SQP  methods  and  a  Lagrangian  method  [MurS82].  Although  not  an 
SQP  method,  the  Lagrangian  method  is  mentioned  here  because  it  offers  one  of  the  few 
efficient  methods  currently  available  for  large-scale  problems.  Also,  the  numerical  results 
of  Section  5  compare  the  SQP  algorithms  presented  in  this  report  with  the  Lagrangian 
method  implemented  in  the  form  of  MINOS.  Both  approaches  use  linearly  constrained 
subproblems.  The  Lagrangian  method  uses  an  augmented  Lagrangian  as  the  subproblem 
objective,  while  SQP  methods  use  a  quadratic  approximation  to  the  Lagrangian.  Both 
methods  involve  the  use  of  major  and  minor  iterations.  A  major  iteration  is  defined  to 
be  the  set  of  steps  required  to  form  and  solve  a  single  subproblem,  while  a  minor  iteration 
constitutes  a  single  iteration  within  a  subproblem. 

2.2.2.  Linearly  constrained  subproblems 

We  may  obtain  a  linearly  constrained  subproblem  from  NLP  by  replacing  c(x)  with  a 
linear  approximation  from  its  Taylor  series  expansion: 

c(  xk  +  p)ssc(  xk )  +  ./( xk  )p,  (2.2. 1 ) 

where  J(xk).  is  the  Jacobian  of  c  evaluated  at  xk.  Define  JE[x k)  and  Jt(xk)  to  be  the 
Jacobian  of  the  nonlinear  equality  and  inequality  constraints  respectively.  The  linearized 
constraints  are  then 


JeP  =  ~cE(xk)  and 

■hv  >  ~c,(xk). 


(2.2.2) 

(2.2.3) 
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As  described  earlier,  it  is  computationally  convenient  to  convert  the  linearly  constrained 
subproblem  so  that  all  general  constraints  are  equalities: 

{Jk  -/)^  =  -(c-s).  (2.2.4) 

The  only  inequalities  are  then  simple  bounds  on  the  variables: 

l-xk<  <u-  xk.  (2.2.5) 

The  linearly  constrained  subproblem  (LCS)  can  now  be  written 

minimize  F{xk  +  p) 

p-'i 

s.t.  Jkp  -  q  =  (ck  -  sk),  LCS 

^  <  (  J  )  <  uk, 

where  /<•  =  /-  xk  and  uk  =  u  -  xk. 

The  subproblem  objective  in  LCS  may  have  many  forms.  For  example  in  the  La- 
grangian  algorithm  of  Murtagh  and  Saunders  [MurS82],  F  takes  the  form  of  an  augmented 
Lagrangian: 

F(x,  A ,p)  =  F(x)  -  A T(c  -  Akp)  +  j/)||c  -  Akp\\2,  (2.2.6) 

where  c  =  c(x)-  ck  and  p  =  x  -  xk.  Note  that  this  subproblem  objective  requires  evalua¬ 
tions  of  both  F(x)  and  c{x)  and  possibly  their  derivatives  during  each  minor  iteration  of 
the  subproblem. 


2.2.3.  QP  subproblems 

SQP  methods  use  the  same  linearized  constraints,  but  the  subproblem  objective  F  is  a 
local  quadratic  model  of  the  Lagrangian 

L(x,  A)  =  F(x)  -  XTc(x)  -  tjt(x  —  l)  -  oT{u  -  x).  (2.2.7) 


The  first  two  terms  of  the  Taylor  expansion  of  (2.2.7)  define  the  quadratic  model: 

F  =  Qk(p)  =  \pTHkp  +  gjp,  (2.2.8) 

where  Hk  is  an  approximation  to  the  Hessian  of  the  Lagrangian  ( Hk  «  W(xk,Xk))  and 
gk  =  VF(n).  One  resulting  QP  subproblem  at  the  point  xk  is 


minimize 

p,q 

Quip )  =  \pTlhp  +  gjp 

s.t. 

Jup  -  (J  =  ~(ck  -  sk), 

/  \ 

QP[*k) 


whose  solution  we  denote  by  (pk,g k)  with  Lagrange  multipliers  pk. 


2.2  Expansion  of  the  model  algorithm 
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Note  that  the  linear  term  in  Qk(p )  is  defined  using  gu  =  VF(xk)  and  not  VXL.  In 
general,  replacing  VXL  by  gk  in  (2.2.8)  may  affect  the  sequence  {x*}.  However,  if  q  = 
and  a  =  aQP  (the  optimal  QP  multipliers  from  the  solution  of  QP[xk)),  then  examination 
of  the  KKT  conditions  for  the  QP  subproblem  shows  that  pk(gk)  —  Pk(VxL).  To  see  why, 
note  that  the  KKT  necessary  and  sufficient  conditions  require  only  that  pk  satisfy 

ZTVQk(Pk)  =  0.  (2.2.9) 

Define  A  =  (J  —  I).  Recall  from  (2.1.10)  and  (2.1.11)  that  the  terms  q  and  a  correspond 
to  the  reduced  costs  (g  -  ATA);  hence,  the  only  difference  between  the  QP  gradient  in 
(2.2.8)  and  that  of  the  model  function  defined  in  terms  of  VXL  is  the  addition  of  linear 
terms  involving  A.  These  terms  are  annihilated  by  premultiplication  by  ZT  in  (2.2.9); 
hence  the  solution  pk  of  the  subproblem  QP(xk)  is  unaltered. 

A  side-effect  is  that,  when  (2.2.8)  is  used  as  the  objective  in  QP{xk),  the  optimal 
Lagrange  multipliers  ft*  for  the  constraints  JkP  =  -(ck  -  Sk)  of  QP(xk)  are  used  as 
estimates  of  A*  (rather  than  as  a  search  direction  for  A*). 

To  obtain  fast  convergence  it  is  necessary  to  include  approximations  to  the  second- 
order  constraint  terms,  which  are  part  of  IV,  the  Hessian  of  the  Lagrangian: 

m 

lV(x,,A,)  =  G(xk)  -  £(  A*  ),-<?,•(**). 

1=1 

Also  note  that  errors  in  A*  and  p*  will  affect  only  the  second-order  terms  in  the  model 
function.  This  gives  some  insight  into  why  these  methods  are  superlinearly  convergent. 

Using  the  subproblem  defined  by  QP(xk)  does  not  require  evaluations  of  the  true 
objective  F(x)  or  constraints  c(x)  during  minor  iterations  of  the  subproblem.  This  can  be 
a  great  advantage  in  some  applications. 


2.2.4.  The  merit  function 

As  discussed  in  Section  1,  much  research  has  been  undertaken  on  the  form  of  the  merit 
function  for  measuring  the  progress  of  an  SQP  method.  As  in  [GMSYV86b]  and  [Pri89]  we 
use  an  augmented  Lagrangian  merit  function  and  include  slack  variables  for  the  nonlinear 
constraints  in  the  merit  function: 

M(x,s,\,p)  =  F(x)  -  A r{c(x)  -  s)  +  \p\\(c(x)  -  s)||2.  (2.2.10) 

As  in  NPSOL  [GMSW86b],  the  slacks  s  =  (s\, . . . ,sm)  are  specially  constructed  for  the 
linesearch  function: 


0 


s,  = 


max(0,c,(x)) 
max(0,  e,(x)  -  A  ,/p) 


if  i  G  £, 

if  *  €  I  and  p  =  0, 
otherwise. 


(2.2.11) 


where  £  and  1  denote  the  sets  of  indices  for  the  nonlinear  equality  and  inequality  con¬ 
straints  respectively.  Choosing  Sk  in  this  way  is  equivalent  to  setting  the  slacks  at  their 
optimal  feasible  values  if  the  merit  function  were  being  minimized  only  with  respect  to 
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s.  The  merit  function  is  thereby  reduced.  Define  =  p*  -  A*  as  the  search  direction 
for  the  Lagrange  multiplier  estimates  Ajt  and  define  qk  =  AkPk  +  c*  -  sjt  as  the  search 
direction  for  the  slacks  just  defined.  Finally,  define  pk  from  the  solution  of  QP{xk)  as 
the  search  direction  for  the  variables  X*.  To  obtain  the  next  iterate  the  linesearch  for  the 
merit  function  is  then  performed  along  the  triple  search  direction: 


**+i 

Sk+l 

Afc+i 


(  xk 
Sk 

V 


(2.2.12) 


The  requirement  is  that  at  the  new  iterate  the  merit  function  be  reduced  by  a  “sufficient” 
amount.  There  are  a  various  ways  to  define  “sufficient”.  Let  4>(a,p)  (or  sometimes  4>(a)) 
denote  the  (linesearch)  merit  function: 


<p(ct,p)  =  M  ( xk  +  apk.Sk  +  oc/t,  A*.  +  aZk,p).  (2.2.13) 

In  our  algorithm  we  shall  choose  a  to  ensure  the  following  conditions: 

4>(a)-4>(0)  <  <ra<)>\ 0),  (2.2.14) 

k'(a)|  <  -#'(  0),  (2.2.15) 

where  0  <  a  <  g  <  For  a  proof  that  a  point  satisfying  these  conditions  exists,  see 
[GMSW861)]  and  [MorS84], 


2.2.5.  Choice  of  the  penalty  parameter 

The  performance  of  the  SQP  algorithm  depends  on  the  choice  of  the  penalty  parameter  p. 
In  practice  it  is  worthwhile  having  a  parameter  for  each  nonlinear  constraint  even  though 
for  theoretical  purposes  a  single  parameter  p  would  suffice.  We  take  p  to  be  an  m-vector 
of  penalty  parameters  and  we  define  D  =  diag(p,),  where  pi  is  the  penalty  parameter  for 
the  i-th  constraint.  With  this  definition  the  merit  function  becomes 

M(x,s,\,p )  =  F(x)  -  A T(c(x)  -  s)  +  %(c(x)  -  s)TD(c(x)  —  s).  (2.2.16) 

At  each  iteration  the  vector  p  may  be  modified  to  ensure  that  the  merit  function  is  reduced 
by  a  sufficient  amount. 

We  now  omit  the  subscripts  k.  Let  (j)'  denote  the  derivative  of  4>  with  respect  to  a.  We 
can  show  that 

4/(0)  =  gTp  +  (2A  -  /t)T(c  -  a)  -  (c  -  s)tD(c  -  s).  (2.2.17) 

To  achieve  reduction  of  the  merit  function  in  the  linesearch  we  require 

4/(0  )<-\pTHp  (2.2.18) 

(see  [GMSW86b]).  When  the  nonlinear  constraints  c(x)  are  violated,  4>'(0)  may  not  satisfy 
(2.2.18)  for  the  current  value  of  the  vector  p,  which  then  must  be  modified. 

To  obtain  a  sufficient  reduction  in  the  merit  function  when  p,  =  p,  it  can  be  shown  by 
rearranging  terms  in  (2.2.17)  that  the  minimum  value  of  p  that  ensures  (2.2.18)  is  given 
by 

_  _  gTp  -f(2A  -  p)T(c  -  s)  +  \pTHp 


2 


(2.2.19) 


2.2  Exjxinsion  of  the  model  algorithm 
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Clearly,  other  choices  of  p,  will  also  ensure  condition  (2.2.18)  is  satisfied.  Two  natural 
questions  are:  What  is  a  “good”  choice  for  pi  that  also  ensures  (2.2.18)?  and:  What  is 
an  adequate  measure  of  goodness? 

As  in  the  code  NPSOL  [GMSW86a],  one  possibility  is  to  minimize  the  two-norm  of  p. 
Define  r  =  (rj, . . . ,  rm)  where  r,  =  (c,  -  s,)2  and  0  =  gTp  +  ( 2A  -  p)T(c  -  s)  +  %pTHp.  The 
choice  of  parameters  under  this  condition  can  be  expressed  as  the  solution  to  the  following 
problem: 


...  i  T 

minimize  kp  p 

p€%m  2' 

s.t.  rTp  >  0,  (2.2.20) 

p  >  0. 

The  solution  p*  of  this  optimization  problem  is  easily  found,  as  shown  by  the  following 
lemma. 

Lemma  2.2.1.  Ford  >  0  the  minimum-euclidean-norm  choice  of  the  m-vector  of  penalty 
parameters  for  the  augmented  Lagrangian  merit  function  (2.2.16)  is  given  by  p*  =  Ar, 
where  X  —  0/rTr. 

Proof.  Let  X  and  p  be  Lagrange  multipliers  for  the  inequality  constraints.  The  KKT 
conditions  for  (2.2.20)  are 


P 

X(rTp*  -6) 

PiPi 

X 

Pi 

* 

P 


> 


> 

> 

> 


0, 

Xr  -|-  p, 
0. 

0, 

0, 

0, 

0. 


Since  the  objective  function  is  strictly  convex,  (2.2.20)  has  a  unique  solution.  It  may  be 
verified  that  the  above  equations  are  satisfied  by  p *  =  Xr  and  p  =  0,  where  A  =  0/rTr.  | 
In  our  implementation  of  the  prototype  algorithm  (Algorithm  2.3.1),  we  increase  pi  to 
the  value  p*  when  (2.2.18)  is  violated. 


2.2.6.  Decreasing  p 

We  use  the  result  of  Lemma  2.2.1  to  increase  the  value  of  each  p,  when  it  is  smaller  than 
its  optimal  value.  As  mentioned  in  Section  1.  it  is  also  necessary  to  ensure  that  p  is  not 
too  large.  When  p,  is  larger  than  p*  it  is  possible  to  reduce  pi  and  still  satisfy  condition 
(2.2.18).  In  these  instances  we  compute  a  trial  value  p,  that  is  equal  to  the  geometric 
mean  of  the  previous  p  and  a  damped  value  of  p*.  The  trial  value  is 


Pi  =  \Jpi(f>k  +  />*), 


(2.2.21) 
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where  St  >  1  is  a  damping  parameter,  and  the  new  />,-  is  defined  as 

f  Pi  'f  Pi  <  \pu 
|  p,  otherwise. 


(2.2.22) 


To  avoid  too  many  modifications  of  p,  each  time  any  element  of  p  changes,  the  damping 
parameter  St  is  increased  by  a  factor  of  two.  This  ensures  that  p,  will  oscillate  only  a 
finite  number  of  times. 


2.2.7.  Updates  to  the  QP  Hessian 

Upon  completion  of  the  linesearch  procedure  it  is  necessary  to  set  up  the  next  subproblem. 
This  consists  of  evaluating  the  gradient  <7,  the  nonlinear  constraints  c  and  the  Jacobian 
J  at  the  new  iterate.  The  new  QP  subproblem  also  requires  H,  an  approximation  to  W. 
In  dense  SQP  algorithms,  H  is  usually  taken  as  a  quasi-Newton  approximation  to  W.  As 
mentioned  in  Section  1,  for  the  large-scale  case  this  can  be  computationally  prohibitive. 
Methods  for  obtaining  H  in  which  only  an  approximation  to  the  reduced  Hessian  ZtHZ 
is  maintained  are  discussed  in  detail  in  Section  3. 


2.3.  The  prototype  algorithm 

Taking  into  account  the  descriptions  of  Section  2.2  we  can  embellish  the  model  algorithm 
of  Section  1.  The  new  algorithm  builds  on  the  framework  of  the  model  algorithm  by 
requiring 

•  use  of  the  augmented  Lagrangian  merit  function  (2.2.16); 

•  estimation  of  Lagrange  multipliers; 

•  use  of  the  reduced  Hessian. 

We  now  present  a  prototype  SQP  algorithm  for  NLP.  The  main  steps  are  summarized  in 
Algorithm  2.3.1. 


2.3.1.  Convergence  of  the  prototype  algorithm 

The  prototype  algorithm  draws  on  the  work  of  Prieto  [Pri89]  and  Gurwitz  [Gur87].  It 
solves  a  sequence  of  problems  of  the  form  QP(xt),  giving  a  sequence  of  solutions  {x*}. 
We  make  the  following  assumptions: 

Aj.  The  SQP  iterates  {x/J  all  lie  in  a  closed,  bounded  region  ft  C  Rn. 

A? ■  The  objective  F(x)  and  the  constraints  c,(x)  and  their  first  and  second 
derivatives  are  continuous  and  uniformly  bounded  in  norm  on  ft. 

A3.  The  Jacobian  of  active  constraints  at  any  limit  point  of  {x;t}£i0  has  full  row 
rank. 

A,|.  There  exists  a  feasible  point  for  each  QP  subproblem. 


2.3  The  prototy\*z  algorithm 
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Algorithm  2.3.1.  (Prototype  SQP  algorithm) 

Start  with  estimates  of  the  solution  xq,  multipliers  Ao,  and  reduced  Hessian  of 
the  Lagrangian  Hq. 

while  not  converged  do 

Evaluate  the  Jacobian  J( x)  and  set  tip  the  subproblem  QP{x). 

Find  a  constrained  stationary  point  p  of  QP(x)  with  associated  multipliers  p. 
if  p  =  0  and  convergence  criteria  are  satisfied  then 
converged  =  true 
else 

Compute  slacks  s  for  the  merit  function  and  search  directions  for  multi¬ 
pliers  and  slacks:  £  =  p  —  A,  and  q  =  Ap  +  (c  —  s). 

Ujxlate  the  diagonal  penalty  matrix  D  =  diag(pt)  if  necessary. 

Compute  the  steplength  a  satisfying  steplength  criteria  for  the  merit 
function  M(x,s,  X,p).  The  linesearch  is  jjerformed  on  the  variables  x,  s. 
and  A  along  corresponding  search  directions  p,  r/,  and 

Update  x  —  x  +  ap  and  A  —  A  +  n£. 

Updute  the  reduced  Hessian  approximation  ZTI1 Z. 

end  if 
end  do 


Figure  1.  Prototype  SQP  algorithm 

As.  Strict  complementarity  holds  for  each  constrained  stationary  point  of  NLP 
in  0. 

Aa.  The  reduced  Hessian  of  the  Lagrangian  is  nonsingular  at  all  I\KT  points  of 
NLP. 


2.3.2.  Global  convergence 

Prieto  [Pri89]  in  his  doctoral  dissertation  analyzed  the  convergence  properties  of  a  reduced- 
llessian  algorithm  based  on  the  use  of  the  Lagrangian  to  define  the  QP  subproblem.  He 
proved  under  assumptions  A\-Aq  and  certain  conditions  on  the  multiplier  estimates  that 
the  algorithm  is  globally  convergent.  (These  conditions  are  satisfied  by  the  estimates 
in  our  prototype  SQP  algorithm.)  The  main  theorem  and  an  important  corollary  from 
this  work  are  repeated  here.  Theorem  2.3.1  and  Corollary  2.3.1  are  proved  in  [Pri89]  as 
Corollary  5.2.1  and  Corollary  5.2.2  respectively. 
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Theorem  2.3.1.  Under  assumptions  A\-A$  and  the  additional  assumption  that 
pk  =  \*  +  0(\\xk-x*\\), 

lim  \\xk  -  **||  =  0. 

k—*oo 

Corollary  2.3.1.  Under  assumptions  Ai~Ae  and  the  additional  assumption  that 
/t,  =  X* +  0(\\xk-x*\\l 

lim  || At  -  A*  ||  =  0. 

fc— *O0 


2.3.3.  Rate  of  convergence 

In  addition  to  global  convergence,  we  are  naturally  interested  in  the  rate  of  convergence. 
We  have  assumed  that  our  approximation  to  the  Hessian  is  only  accurate  on  the  null 
space  of  the  active  constraints.  A  consequence  of  the  use  of  less  precise  information  is 
a  degradation  in  the  rate  of  convergence  for  the  algorithm,  relative  to  one  in  which  the 
full  Hessian  is  available  or  approximated.  It  is  shown  in  [Pri89]  that  provided  the  penalty 
parameter  is  chosen  to  be  sufficiently  large  and  Hz  is  a  sufficiently  good  approximation 
to  the  reduced  Hessian  of  the  Lagrangian,  the  algorithm  converges  two-step  superlinearly. 
That  is,  the  iterative  sequence  {ajt}  satisfies 


lim 


IN+2  ~  «*ll 
IN-**!! 


-  o. 


(2.3.1) 


The  precise  conditions  are: 

A?.  When  the  iterates  are  close  to  the  solution,  the  penalty  parameter  is  chosen 
to  be  “large  enough”. 

.4 8.  || Zl(Hk  -  Wk)ZkPzk\ |  =  ofM),  where  W*  denotes  the  Hessian  of  the  La¬ 

grangian  at  Xk. 


The  theorem  giving  the  required  rate  of  convergence  (Theorem  5.3.1  in  [Pri89])  is  stated 
here  without  proof. 

Theorem  2.3.2.  There  exists  a  value  p,  such  that  if  pk  is  selected  satisfying  pk  >  p,  and 
if  assumptions  A\-A%  and  the  additional  assumption  that  pk  -  A*  +  0(|N  —  x*||)  are 
satisfied,  then  the  algorithm  converges  two-step  stij)erlinearly. 

2.4.  Solution  of  the  subproblem 

The  method  used  to  solve  the  QP  subproblems  is  an  active-set  method.  It  is  related  to 
the  reduced-gradient  method  jus  implemented  in  MINOS  [MurS78]. 

As  in  MINOS,  it  is  computationally  convenient  to  convert  all  general  constraints  to 
equalities,  with  the  only  inequalities  being  simple  bounds  on  the  variables.  For  notational 
convenience,  the  search  direction  for  the  subproblem  QP(xk)  is  augmented  to  include  the 
slacks  g.  Define  Ak  =  (  Jk  -I  )  and  define  gk  €  ftn+m  to  be  the  original  gradient  vector 


2.4  Solution  of  the  subproblem 
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augmented  by  m  zeros.  Likewise,  the  QP  Hessian  is  augmented  with  zeros  so  that  it  has 
dimension  n  +  m: 


(2.4.1) 


We  may  write  the  new  QP  subproblem  in  standard  form  as 


minimize 

pgjfn+m 

Qk(p)  =  \pTHkp  +  gjp 

S.t. 

AkP  =  -(cfc  -  sfc). 

4  <  p  <  uk , 

QPSF\xk) 


with  optimal  solution  p*  and  optimal  Lagrange  multipliers  p* . 

At  a  local  minimizer  of  QPSF\xk),  the  point  p*  satisfies  Akp *  —  —{ck—s\ t).  In  addition, 
many  variables  p*  (usually)  attain  the  value  of  one  of  their  bounds.  It  is  of  interest  to 
consider  the  set  of  indices  corresponding  to  the  bounds  on  p*  that  are  exactly  satisfied 
(i.e.  p*  =  (lk)j  or  ( uk)j ).  We  call  this  index  set  of  constraints  that  are  “tight”  or  “active” 
at  the  solution  the  active  set.  The  active  set  for  the  subproblem  can  be  represented  by 
the  active-set  matrix, 

A=(A  /  )  '  (2-4.2) 

where  N  consists  of  the  columns  of  the  linearized  constraints  corresponding  to  variables 
exactly  equal  to  one  of  their  bounds  at  the  optimal  solution  of  the  subproblem.  The 
columns  of  A  correspond  to  the  remaining  variables. 

If  the  active  set  were  known  a  priori,  the  solution  to  the  subproblem  could  be  solved 
in  a  single  iteration.  In  general  we  do  not  know  the  active  set  at  the  start  of  the  solution 
process  for  a  subproblem.  Active-set  methods  employ  what  is  called  a  working  set,  which 
attempts  to  predict  the  active  set.  The  working-set  matrix  A  has  the  form 


(2.4.3) 


where  N  consists  of  the  columns  of  A  corresponding  to  variables  temporarily  held  at 
their  current  values  (typically  on  a  bound),  and  B  and  S  are  the  remaining  columns  of 
A  partitioned  so  that  B  is  nonsingular.  We  refer  to  the  variables  corresponding  to  the 
columns  in  B,  S,  and  N  as  basic,  sujjerbasic,  and  nonbasic  variables  respectively. 

Active-set  methods  employ  a  procedure  to  check  whether  a  feasible  stationary  point  is 


optimal  (i.e.  when  the  working  set  has  identified  the  active  set).  Let  g„p  =  II kp  +  gk  and 
omit  the  subscript  k.  Consider  the  first-order  KKT  conditions  for  (/>  ,/i*)  to  be  a  local 
minimum  of  the  QP  subproblem: 


Ap* 

= 

-(<*  -  •!*)> 

(2.4.4) 

1* 

> 

(2.4.5) 

* 

P 

< 

«, 

(2.4.6) 

V*T  * 

Z  Uqp 

- 

0, 

(2.4.7) 

*T  ♦. 

> 

0  for;  =  l,...,n, 

(2.4.8) 
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Q  <9 


(Pj  -  h)((9%p)j  ~  «j  V )  <  0  for  j  =  1, . . . ,  n.  (2.4.9) 

*r 

This  means  that  p*  must  be  feasible  and  the  reduced  gradients  (g^P)j  -  a*  p  must  be  zero 
for  any  variable  not  on  a  bound  (including  for  example  a  free  variable  that  is  nonbasic). 
Let  p  be  a  feasible  constrained  stationary  point  for  the  subproblem.  If  KKT  conditions 
(2.4.8)-(2.4.9)  hold  then  the  active  set  has  been  identified  and  p  is  an  optimal  solution 
to  the  subproblem.  Verification  of  (2.4.8)  and  (2.4.9)  for  nonbasic  variables  is  carried  out 
in  the  process  of  solving  the  QP  subproblem.  This  procedure  is  known  as  pricing  and 
is  used  to  modify  the  working  set.  When  a  nonbasic  variable  fails  the  pricing  test,  the 
QP  objective  can  be  reduce  by  deleting  the  variable  from  the  working  set  and  moving  the 
variable  off  its  bound.  These  ideas  give  rise  to  a  model  algorithm  for  QP  subproblems. 


Algorithm  2.4.1.  (Model  QP  algorithm) 

Find  a  feasible  point  for  the  QP  subproblem.  This  defines  a  working  set. 
while  not  converged  do 

While  remaining  feasible,  find  a  constrained  stationary  point  for  the  QP 
subproblem.  This  process  may  increase  the  she  of  the  working  set  as  one  or 
more  of  the  basic  or  superbasic  variables  encounter  a  bound. 

Price  to  determine  if  the  working-set  size  shoidd  be  reduced. 

Modify  the  working  set  by  allowing  one  or  more  nonbasic  variables  to  move 
off  a  bound. 

end  do 


Figure  2.  Model  QP  algorithm 


Chapter  3 

Large-scale  Quadratic  Programs 


In  this  chapter,  important  computational  constructs  are  developed  to  assist  solution  of 
the  large-scale  QP  subproblems  arising  in  the  SQP  method. 


3.1.  The  null-space  basis  Z 

Recall  from  Section  2  that  during  the  solution  of  a  QP  subproblem  it  is  necessary  to 
maintain  Z ,  a  null-space  basis  of  the  working-set  matrix  A.  This  is  because  the  optimality 
conditions  for  the  QP  subproblem  depend  in  part  on  Z  and,  as  demonstrated  in  Section  3.2, 
the  form  of  the  QP  Hessian  also  depends  on  the  form  of  Z. 

For  the  dense  case  of  NLP,  Gill  et  al.  [GMSW84,  GMSSW85]  have  used  an  orthonormal 
basis  Z  obtained  by  ujxiating  the  rows  and  columns  of  the  TQ  factorization 

A=  (  0  T)Q,  (3.1.1) 

where  Q  is  an  nxn  orthogonal  matrix,  and  T  is  a  triangular  matrix  with  varying  dimension 
t.  In  this  case  Q  can  be  partitioned  as 

n—t  t 

Q  =  (  "z"  'Y'  ).  (3.1.2) 

Forming  an  orthogonal  Q  in  the  large-scale  case  is  prohibitively  expensive  in  general.  A 
practical  method  when  n  is  large  is  to  represent  Z  as  the  reduced-gradient  null-space  basis 
of  the  working-set  matrix  A  (2.4.3).  This  has  been  used  with  success  in  the  mathematical 
programming  system  MINOS  [MurS78,  MnrS87]  and  has  the  form 


(3.1.3) 


As  long  as  B  in  (2.4.3)  is  nonsingular,  Z  in  (3.1.3)  is  a  basis  for  the  null  space  of  A. 
In  contrast  to  the  dense  case,  this  matrix  Z  is  not  computed  or  stored  explicitly.  Instead, 
a  sparse  factorization  B  =  LU  is  maintained  along  with  an  index  set  for  the  columns  of 
B,  5,  and  N .  Products  involving  Z  and  ZTcan  then  be  performed  easily  bv  solving  with 
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B  or  Bt  and  using  the  nonzero  elements  of  the  columns  in  S.  For  example,  the  reduced 
gradient 

i)  =  ZTg  = 


may  be  obtained  from  the  operations 

solve  UT v 
solve  LTir 
form  i) 


STB~TgB 

(3.1.4) 

9b  ' 

(3.1.5) 

v; 

(3.1.6) 

9s  ~  ST 

(3.1.7) 

3.2.  Solution  of  subproblems 


SQP  methods  make  use  of  local  curvature  information  to  construct  QP  subproblems. 
Recall  from  Section  2  that  the  search  direction  p  for  each  SQP  iterate  is  constructed  from 
a  constrained  stationary  point  of  a  large-scale  QP  subproblem  (LSQP)  defined  at  the 
current  NLP  iterate  x: 


minimize 

\pTHp  +  gTp 

pgjjm  +  n 

s.t. 

Ap  =  -c, 

l  <  p  <  u. 

LSQP 


This  form  of  LSQP  is  slightly  different  from  the  subproblem  QP(n-)  defined  in  Section  2. 
Here  p  €  9i”*+n  is  a  search  direction  for  the  m  slack  variables  s  as  well  as  the  variables 
x.  Accordingly,  we  define  A  =  (  J(x)  -I  ),  where  «7(ar)  is  the  Jacobian  of  c(ar).  For  the 
sake  of  brevity,  the  right-hand-side  vector  for  LSQP  has  been  redefined  to  be  c  =  c(z)  —  a. 

Clearly,  the  ability  to  solve  large-scale  QP  subproblems  efficiently  is  crucial  to  our  SQP 
algorithm.  Because  we  have  some  freedom  in  determining  the  form  of  the  QP  Hessian  H , 
we  construct  H  to  be  positive  definite.  The  minimizer  of  the  subproblem  p*  is  then 
unique  (i.e.  a  global  minimizer).  In  our  algorithm,  H  will  not  be  explicitly  available.  A 
key  concept  is  to  work  with  a  nonsingular  matrix  Q  such  that  Q~l  and  QTHQ  have  a 
reasonably  simple  form.  The  forms  of  Q,Q~l  and  QTHQ  are  discussed  in  Section  3.5. 

At  each  iteration  of  most  active-set  methods  (and  all  of  the  methods  we  consider)  a 
KKT  system  is  solved: 

(3")U)-(:)- 

where  A  is  the  working-set  matrix  and  c  is  the  QP  right-hand-side  vector  padded  with 
zeros  to  make  it  compatible  with  A.  Nearly  all  active-set  methods  for  solving  LSQP 
generate  the  same  sequence  of  iterates  (see  [CotD79],  for  example).  The  methods  differ 
in  how  the  iterates  are  computed,  and  their  efficiency  depends  on  the  problem  type.  QP 
methods  may  be  categorized  according  to  the  approach  used  to  solve  (3.2.1).  If  the  system 
is  solved  directly  we  say  the  QP  method  is  a  Lagrangian  method.  If  (3.2.1)  is  reduced  to 
solving  two  smaller  systems  we  refer  to  the  method  as  a  projection  method. 

Let  Z  be  a  basis  for  the  null  space  of  A  and  define  Y  so  that  (  Z  Y  )  and  AY  are 
nonsingular.  Projection  methods  come  in  two  different  flavors:  range-space  methods  and 
null-space  methods  (see  [GMW81],  for  example).  This  terminology  arises  from  the  fact 
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that  A  defines  two  complementary  subspaces  spanned  by  Z  and  Y.  The  work  required  to 
solve  for  the  optimality  conditions  is  directly  related  to  the  dimension  of  either  the  null 
space  (the  dimension  of  ZTHZ)  or  the  range  space  (the  dimension  of  YTH~1Y).  Because 
we  use  a  null-space  method  for  solving  (3.2.1),  the  method  is  most  efficient  when  the 
dimension  of  the  null  space  of  A  is  small.  It  is  this  class  of  problems  that  we  are  most 
interested  in. 

Each  iteration  of  an  active-set  method  for  solving  LSQP  is  called  a  minor  iteration. 
Solving  the  associated  KKT  system  (3.2.1)  is  equivalent  to  solving  a  single  equality- 
constrained  quadratic  program  (EQP). 

The  following  is  a  model  active-set  algorithm  for  solving  LSQP.  It  assumes  that  a 
working  set  of  the  form  (2.4.3)  is  available  and  that  the  associated  point  p  is  feasible,  i.e. 
satisfies  the  constraints  in  LSQP. 


Algorithm  3.2.1.  (Active-set  algorithm) 
while  not  converged  do 

if  the  minimizer  has  been  found  on  the  current  subsjxice  then 
Price  nonbasic  variables. 
if  no  new  su])erbasic  candidates  exist  then 
converged  =  true 
else 

Delete  a  bound  from  the  working  set. 

end  if 
end  if 

if  not  converged  then 

Solve  the  EQP  defined  by  the  working  set. 
if  a  bound  was  encountered  in  the  solution  of  the  EQP  then 
Add  the  bound  encountered  to  the  working  set. 

end  if 
end  if 
end  do 


Figure  1.  Active-set  algorithm 


Our  definition  of  convergence  in  Algorithm  3.2.1  depends  on  finding  a  minimizer  of  LSQP. 
The  algorithm  is  easily  modified  to  halt  upon  finding  the  first  constrained  stationary 
point  (see  Section  3.2.3).  In  addition.  Algorithm  3.2.1  requires  finding  a  feasible  point,  the 
computation  of  which  is  itself  an  optimization  problem.  The  main  parts  of  Algorithm  3.2.1 
(signified  by  the  boxed  text)  will  be  discussed  in  the  following  sections. 
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3.2.1.  Finding  an  optimal  point 

Let  A  denote  the  working  set  for  a  feasible  point  p  for  LSQP.  Since  H  is  chosen  to  be 
positive  definite,  the  reduced  Hessian  ZTIIZ  is  known  a  priori  to  be  positive  definite 
for  every  minor  iteration.  Define  g gp  =  g  +  Up,  the  QP  gradient  at  the  point  p,  and 
Hz  =  ZtHZ,  the  reduced  Hessian. 

The  optimality-phase  algorithm  starts  with  a  feasible  point  p,  tolerances  6rg  and  6$, 
and  a  partition  of  A  into  (  B  S  N  ).  Define  ns  to  be  the  number  of  columns  in  S. 
Assume  that  the  QP  gradient  g gp  is  available,  along  with  factorizations  for  the  basis 
B  =  LI I  and  the  reduced  Hessian  Hz  =  RTR. 

The  first  step  in  finding  an  optimal  point  is  to  determine  whether  the  current  point  p 
is  a  constrained  stationary  point,  by  checking  whether  the  current  point  is  a  minimizer  on 
the  current  subspace  or  equivalently,  the  norm  of  the  reduced  gradient  is  zero  (or  smaller 
than  a  specified  tolerance).  If  so,  it  is  then  necessary  to  check  whether  p  is  the  minimizer 
of  the  subproblem,  by  checking  the  signs  of  the  multipliers  (also  called  reduced  costs) 
for  the  bound  constraints.  The  multipliers  are  calculated  by  pricing,  as  summarized  in 
Procedure  3.2.1. 


Procedure  3.2.1.  (Price  nonbasic  variables) 

Form  tfqp  =  g  +  Hp. 

Solve  UTLTp  =  (j/gple- 
Calculate  7/  =  (gQP)N  —  NTp. 

Select  rp  (i)„),  the  most  negative  (jtosilive)  element  of  i)  corresponding 
to  variables  at  their  lower  (upjter)  bounds. 


If  ip  >  -bjj  and  t)u  <  bjj,  we  conclude  that  p  is  a  minimizer  of  the  subproblem. 
Otherwise,  the  QP  objective  may  be  reduced  by  deleting  a  bound  from  the  working  set 
and  adding  a  variable  to  the  superbasic  set.  Procedure  3.2.2  summarizes  the  steps  required 
to  modify  the  working  set  when  a  bound  is  deleted. 


Procedure  3.2.2.  (Delete  a  bound  from  the  working  set) 

Choose  q  =  argmax(\i]t\,  i]u). 

Add  q  to  the  sujter  basic  index  set. 

Add  qq  as  a  new  element  of  ZTgqp. 

Add  a  new  column  to  R  and  increase  ns  by  1. 


The  search  direction  for  LSQP  at  p  is  defined  by  the  Newton  equations, 

(3.2.2) 

(3.2.3) 


H zPz  ^  9qp  ’ 

P  =  Zpz, 


which  solve  the  EQP  defined  by  the  current  working  set.  The  next  iterate  p  for  the 
subproblem  is  p  —  p  +  aft.  Due  to  the  quadratic  nature  of  the  objective  function  along 
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p,  when  p  +  p  is  feasible  then  a  =  1  and  p  is  a  constrained  stationary  point  for  LSQP.  If 
p  +  p  is  not  feasible  then  a  <  1  is  the  step  to  the  nearest  bound  constraint  along  p.  The 
precise  steps  for  solving  the  EQP  are  summarized  in  Procedure  3.2.3. 


Procedure  3.2.3.  (Solve  an  EQP) 

Solve  RTRps  —  -ZTgv. 

Solve  LUpB  =  -Sps. 

Calculate  ac,  the  step  to  the  nearest  bound  along  p,  where 

P  =  (  Pb,  Ps ,  0  ). 

Compute  a  =  min  ( 1 ,  ac ) . 

Update  p  =  p  +  ap. 


When  the  unit  step  is  not  feasible  the  nearest  bound  is  added  to  the  working  set.  If 
the  variable  corresponds  to  a  column  in  B.  a  column  from  S  must  be  chosen  to  replace 
it  in  B  (see  Section  3.3,  no.  3).  The  steps  required  when  a  bound  is  encountered  in  the 
solution  of  the  EQP  are  summarized  in  Procedure  3.2.4. 


Procedure  3.2.4.  (Add  a  bound  to  the  working  set) 
if  a  =  oc  then 

Add  the  new  bound  to  the  working  set. 

Decrease  ns  by  1. 

Ujxlate  R. 

Ujxlate  the  LU  factors  if  necessary. 

end  if 


Two  tests  for  convergence  are  required:  one  to  check  for  convergence  in  the  current 
subspace  and  one  to  detect  convergence  to  the  QP  minimizer.  As  discussed  in  Section  2.  it 
is  not  always  necessary  to  obtain  the  minimizer  of  the  QP  subproblem  in  order  to  obtain 
a  search  direction  for  NLP.  The  emphasis  of  Section  3.2.3  will  be  to  develop  a  strategy  in 
which  we  terminate  the  solution  of  the  subproblem  upon  finding  a  QP  stationary  point. 
The  complete  set  of  steps  for  the  optimality-phase  algorithm  for  finding  a  minimizer  of 
the  QP  is  given  in  Algorithm  3.2.2. 

3.2.2.  Obtaining  a  feasible  point 

Algorithm  3.2.2  requires  a  feasible  point  for  LSQP.  At  the  start  of  a  QP  subproblem,  the 
basic  variables  pB  are  defined  by 

DpB  =  -c. 

If  pB  is  feasible  we  may  commence  with  the  optimality  phase.  A  subproblem  is  infeasible 
only  if  the  bounds  on  the  variables  (/„  <  pB  <  un)  are  violated.  During  the  feasibility 
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Algorithm  3.2.2.  (Optimality  phase  for  LSQP) 
while  not  converged  do 

if  \\ZTgqp\\  <  6rg  then  { Price  nonbasic  variables } 

Form  </gp  =  g  +  Up. 

Solve  UtLth  =  (g^s- 
Calculate  rp  =  (<7<jp  )n  -  NTpi. 

Select  iji  (i]u),  the  most  negative  ( positive )  element  of 
rp  correspxrnding  to  variables  at  their  lower  (upper)  bounds. 

if  'Ji  >  -^dj  and  rpu  <  f>dj  then 
converged  =  true 

else  {Delete  a  bound  from  the  working  set} 

Choose  q  =  argmax(\i]i\,i]u). 

Add  q  to  the  suporbasic  index  set. 

Add  rpq  as  a  new  element  of  ZTg gp. 

Add  a  new  column  to  R  and  increase  ns  by  1. 

end  if 
end  if 

if  not  converged  then  {Solve  an  EQP } 

Solve  RTRps  =  -ZTgQP. 

Solve  LUpg  =  —Sps. 

Calcidate  ac,  the  step  to  the  nearest  bound  along  p  =  (  pB,  ps ,  0  ). 
Compute  a  =  min (l, a c). 

Upxlate  p  =  p- f  ap. 

if  a  =  oc  then  {Add  a  bound  to  the  working  set} 

Add  the  new  bound  to  the  working  set  and  decrease  ns  by  1. 
Upxlate  R  and  upxlate  the  LU  factors  if  necessary. 

end  if 

Upxlate  the  reduced  gradient. 

end  if 
end  do 


Figure  2.  Optimality  phase  for  LSQP 
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phase  of  the  subproblem,  the  objective  is  the  sum  of  infeasibilities  for  the  bounds.  This 
Phase  1  problem  may  be  written 


where  ; 3+  =  max(0,/?).  This  procedure  for  finding  a  feasible  point  is  similar  to  the  Phase  I 
method  for  finding  a  feasible  point  for  a  linear  program,  extended  to  work  with  nonbasic 
points  (i.e.  with  superbasic  variables).  The  difference  between  the  feasibility  phase  and 
the  optimality  phase  is  that  the  gradient  of  the  sum  of  infeasibilities  must  be  formed  in 
place  of  </gp,  and  the  steepest  descent  search  direction  is  used  in  place  of  a  search  direction 
defined  in  terms  of  the  reduced  Hessian  Hz. 

It  is  not  enough  just  to  find  a  feasible  point.  If  the  working  set  is  changed  in  the 
feasibility  phase  it  is  necessary  to  modify  //z,  since  this  matrix  is  required  for  the  optimality 
phase. 

When  the  working  set  is  modified  to  include  additional  bounds,  the  reduced  Hessian 
is  modified  within  the  feasibility  phase.  When  bounds  are  deleted  from  the  working  set 
there  are  two  options  worth  considering. 

Because  only  the  steepest  descent  direction  is  used  during  the  feasibility  phase,  one 
strategy  is  to  wait  until  a  feasible  point  has  been  found  before  expanding  the  reduced 
Hessian,  should  that  be  necessary.  When  the  working  set  has  been  modified  and  has  fewer 
bounds  than  at  the  start  of  the  feasibility  phase,  the  new  reduced  Hessian  may  be  modified 
by  appending  rows  and  columns  of  the  identity: 


(3.2.4) 


Another  strategy  is  to  update  Hz  —  RTR  to  take  into  account  deletions  from  the 
working  set  as  they  occur  in  the  feasibility  phase.  For  this  strategy,  the  updates  are  the 
same  as  those  for  the  optimality  phase  and  may  require  multiplications  with  the  full  H 
when  the  working-set  size  is  reduced.  The  modifications  to  Ilz  are  discussed  in  detail  in 
Section  3.7. 

The  feasibility-phase  algorithm  starts  with  tolerances  brg  and  6$  and  a  partition  of 
A  into  (  B  S  N  ).  Define  7is  to  be  the  number  of  columns  in  5.  Assume  that  the 
basis  factorization  B  =  LU  is  available,  set  ps  =  0  and  pN  =  0,  and  solve  LUpg  =  -c  to 
determine  the  initial  elements  of  the  search  direction. 


3.2.3.  Early  termination  of  subproblems 

The  prototype  SQP  algorithm  for  NLP  (Algorithm  2.3.1 )  allows  the  use  of  only  stationary 
points  rather  than  minimizcrs  to  construct  a  search  direction  for  the  NLP  merit  function. 
As  mentioned  in  Section  1,  there  are  two  reasons  for  early  termination  of  the  active-set 
method.  First,  it  is  desirable  to  place  a  limit  on  the  computational  effort  made.  Second, 
when  it  is  a  poor  approximation  to  x*  (and  this  is  the  circumstance  when  many  minor 
iterations  may  be  required),  the  effort  to  find  a  QP  minimizer  seems  unwarranted  in  light 
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Algorithm  3.2.3.  (Feasibility  phase  for  LSQP) 
while  not  (converged  or  infeasible)  do 

Form  g,  the  gradient  of  the  sum  of  infeasibilities. 
if  II^H  =  0  then 
converged  =  true 
else 

if  ||ZT</||  <  6rg  then  {Price  nonbasic  variables} 

Calculate  g  =  gN  -  NTp. 

Select  g i  (gu),  the  most  negative  (jx>sitive)  element  of  g  correspond¬ 
ing  to  variable  ■<  their  lower  (upper)  bounds. 

>f  gi  >  -<V  'i  g„  <  firtj  then 
infeasible  =  true 

else  {  Delete  a  bound  from  the  working  sef} 

I  Choose  q  =  argmax(\gi\,gu) 

Add  q  to  the  superbasic  index  set. 

Add  gq  as  a  new  element  of  ZTg  and  increase  ns  by  1. 

end  if 
end  if 

if  not  (converged  or  infeasible)  then  {Solve  an  EQP) 


end  if 
end  if 
end  do 


Figure  3.  Feasibility  phase  algorithm 
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of  the  fact  that  the  subproblem  may  be  a  poor  model  (even  locally).  Thus,  an  early- 
termination  strategy  may  reduce  the  total  number  of  QP  iterations  required  to  find  a 
minimizer  for  NLP. 

There  may  be  a  further  benefit  associated  with  early  termination.  The  pricing  step 
requires  much  computational  effort  (Procedure  3.2.1)  and  often  constitutes  a  large  per¬ 
centage  of  overall  processing  time  when  the  subproblem  is  solved  to  optimality.  Pricing 
involves  solving  with  BT  { to  obtain  /i)  and  forming  NTp  (to  obtain  reduced  costs  q).  The 
early-termination  strategy  only  requires  solves  with  B  (to  obtain  the  search  direction)  and 
a  linesearch.  If  a  unit  step  is  taken  we  terminate  the  solution  of  the  subproblem,  since 
the  resulting  point  is  a  constrained  stationary  point.  Otherwise,  a  bound  is  encountered 
during  the  linesearch,  the  reduced  gradient  is  updated  cheaply,  and  the  search  direction 
is  calculated  anew  in  the  smaller  subspace.  This  approach  requires  neither  Lagrange  mul¬ 
tiplier  estimates  nor  reduced  costs!  Hence,  an  important  advantage  of  early  termination 
of  subproblems  is  the  elimination  of  pricing  during  subproblem  solution  (although  it  is 
necessary  to  price  outside  the  subproblem  to  determine  if  the  working  set  should  be  mod¬ 
ified). 


Algorithm  3.2.4.  (First-stationary-point  algorithm  for  LSQP) 
while  not  converged  do 
if  \\2TgQP\\  <  bTg  then 
converged  =  true 
else  {Solve  an  EQP) 

Solve  RrRps  =  -ZTgQf>. 

Solve  LUpB  =  -Sps. 

Calcxdate  ac,  the  step  to  the  nearest  bound  along  p  —  (  pB,  ps,  0  )  . 
Compute  a  =  min  (l,oc). 

Ujxlate  p  =  p  +  ap. 

if  a  =  ac  then  {Add  a  bound  to  the  working  se/} 

Add  the  neiv  bound  to  the  working  set,  decrease  ns  by  1,  and  ujxlate 
R  and  the  LU  factors  if  necessary. 

end  if 

Update  the  reduced  gradient. 

end  if 
end  do 


Figure  4.  First  stationary  point  algorithm 
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With  the  early-termination  strategy,  we  have  two  options  for  modifying  Hz  (as  in  the 
feasibility  phase).  Upon  completion  of  a  major  iteration  we  may  decide  to  price  inside  or 
outside  the  subproblem.  After  completion  of  the  linesearch  to  reduce  the  merit  function, 
the  next  QP  is  set  up,  Lagrange  multiplier  estimates  are  then  calculated  and  nonbasic 
columns  are  priced.  If  the  current  point  is  not  the  minimizer,  a  decision  is  made  to  move 
off  one  or  more  of  the  bounds  in  the  working  set. 

Pricing  outside  the  subproblem  allows  the  reduced  Hessian  to  be  updated  to  reflect  the 
new  superbasic  components  in  a  computationally  convenient  way.  The  new  Hz  is  obtained 
by  appending  a  row  and  column  of  the  identity  for  each  new  superbasic  variable  as  in 
(3.2.4).  This  avoids  products  of  the  form  Hz ,  which  can  be  computationally  expensive. 
Note  that  a  major  difference  relative  to  the  algorithm  of  Prieto  [Pri89]  is  that  the  prototype 
algorithm  does  not  calculate  an  auxiliary  search  direction  once  a  stationary  point  has  been 
identified. 

Algorithm  3.2.4  presents  a  stationary-point  algorithm  for  LSQP  that  terminates  upon 
finding  the  first  constrained  stationary  point  encountered  during  subproblem  solution. 
The  First- stationary-point  algorithm  starts  with  a  feasible  point  p,  a  tolerance  6rg,  and  a 
partition  of  A  into  (  B  S  N  ).  Define  ns  to  be  the  number  of  columns  in  S.  Assume 
that  the  QP  gradient  g ^  =  g  +  Hp  is  available  and  factorizations  are  available  for  the 
basis  B  =  LU  and  the  reduced  Hessian  Hz  =  RTR. 


3.3.  Updating  Z 

When  A  is  updated  during  the  solution  of  the  subproblem  it  is  necessary  to  update  both 
Z  and  IIZ.  The  updates  to  the  working  set  come  in  three  forms: 

1.  .4  bound  is  deleted  and  the  corresponding  column  is  added  to  S. 

When  we  decide  to  drop  one  or  more  of  the  bound  constraints  from  the  working  set 
this  has  the  effect  of  adding  one  or  more  columns  to  the  matrix  5.  The  addition  of 
a  superbasic  also  increases  the  number  of  columns  of  Z.  and  the  dimension  of  Hz. 

2.  A  variable  corresponding  to  a  column  in  S  hits  a  bound. 

When  a  variable  corresponding  to  the  r/-th  column  in  S  encounters  a  bound,  the 
variable  is  deleted  from  S  and  added  to  N .  Both  Z  and  Hz  must  be  updated  to 
reflect  the  modification  to  S.  The  g-th  column  of  Z  is  implicitly  deleted,  and  R  is 
modified  to  reflect  deletion  of  the  <y-th  row  and  column  of  Hz. 

3.  .4  variable  corresponding  to  a  coiiimn  in  B  hits  a  bound. 

When  a  variable  corresponding  to  a  column  in  B  encounters  a  bound  the  updates 
to  Ilz  are  more  complicated.  It  is  necessary  to  replace  the  column  from  B  with  a 
suitable  column  from  .S'  (one  that  maintains  the  nonsingularity  of  B).  Updates  to 
the  LU  factors  of  B  are  carried  out  using  a  method  standard  to  the  simplex  method 
for  linear  programming  (see  [GMSW87]). 

The  updates  to  II 7  for  each  of  these  cases  are  discussed  in  Section  3.7. 


3.4  Continuity  of  Z 
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3.4.  Continuity  of  Z 

In  order  to  prove  that  Algorithm  2.3.1  has  a  superlinear  rate  of  convergence,  it  is  necessary 
to  assume  that  Hz  is  an  adequate  approximation  to  Z^XVZ.  In  Section  2  we  assumed  that 
the  gradients  and  Hessians  of  F(x)  and  c(x)  exist  and  are  continuous  and  uniformly 
bounded  in  norm  on  fi.  The  quasi-Newton  scheme  for  approximating  Z^W Z  is  based  on 
inherited  information.  If  ZTZ  is  not  continuously  differentiable  in  the  neighborhood  of  x * 
then  the  assumption  that  Hz  is  a  good  approximation  to  Z^V Z  is  not  reasonable. 

Discussion  of  the  continuity  of  Z  was  initiated  by  Coleman  and  Sorensen  [ColS84], 
who  showed  that  a  standard  method  for  computing  an  orthogonal  factorization  of  A  may 
not  provide  a  continuously  differentiable  Z(x).  Gill  et  al.  [GMSSW85]  showed  how  to 
compute  a  continuously  differentiable  Z  using  regularized  Householder  transformations, 
and  they  proved  the  convergence  of  both  Q  and  Z  under  appropriate  assumptions. 

The  difficulties  associated  with  the  continuity  of  Z(.r)  that  arise  using  orthogonal 
transformations  do  not  arise  for  the  reduced-gradient  form  of  Z,  as  the  following  lemma 
demonstrates. 

Lemma  3.4.1.  Let  the  sequence  {.?;•}  be  defined  by  the  prototyjie  SQP  algorithm  with 
limit  jx>int  x* .  Let  B  be  a  ball  around  a  point  x*  and  supjmse  that  the  correct  active  set 
has  been  identified  and 

A(xk)  =  (Bk  Sk  Nk  ) 

is  a  continuously  differentiable  function  of  xk  in  B.  Further,  suppose  that  Bk  has  rank  m 
for  all  xk  in  B  and  the  indices  defining  the  columns  in  Bk  aie  identical  for  all  xk  in  B. 
Then 

Bk  Sk  Nk 

\  I  I 

B*  S*  N* 

and  the  null-sjxice  basis  Z(xk)  obtained  as  the  reduced  gradient  matrix  from  (3.1.3)  has 
elements  that  are  continuously  differentiable  for  all  xk  in  B. 

Proof.  Since  Bk  has  the  same  indices  and  the  active  set  is  fixed  inside  B ,  by  definition 
of  the  active  set  the  result  A(xk)  —  A*  =  (  B*  S*  N*  )  must  follow.  In  addition, 
since  A(xk)  =  (  Bk  Sk  Nk  )  is  continuously  differentiable  for  xk  in  B  then  Bk  also  has 
these  properties.  Moreover,  since  Bk  is  continuously  differentiable  and  has  full  rank,  Bfx 
exists  and  is  also  continuously  differentiable.  Finally,  the  continuity  of  Z(xk)  follows  from 
the  fact  that  since  Sk  has  elements  that  are  continuously  differentiable  for  all  xk  in  B ,  the 
linear  transformation  Bfx  Sk  has  continuously  differentiable  elements  for  all  xk  in  B.  | 
The  fact  that  B^ 1  is  not  explicitly  computed,  but  operations  with  the  matrix  are  done 
using  the  LU  factors  of  Bk,  does  not  impact  the  continuity  of  Z.  Note  that  in  general  the 
LU  factors  are  not  continuously  differentiable  but  BJ)1  is. 

3.5.  The  QP  Hessian  and  the  transformation  matrix  Q 

In  our  prototype  algorithm  we  recur  //z,  an  approximation  to  the  reduced  Hessian  of  the 
Lagrangian.  Condition  A$  of  Section  2.3.1  requires  that  U .  the  approximation  to  IT,  be 
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accurate  only  in  the  null  space  of  the  rows  of  A.  We  are  free  to  define  H  in  any  way 
provided 

ZTliZ  =  Hz.  (3.5.1) 

It  is  important  to  note  that  although  (3.5.1)  must  hold,  the  matrix  product  is  never 
actually  formed. 

When  only  the  reduced  Hessian  is  recurred  it  is  not  obvious  how  the  QP  gradient  can 
be  formed  without  great  expense,  since  the  QP  gradient  depends  on  H .  To  form  the  QP 
gradient  in  the  dense  case  (when  H  is  known  explicitly)  we  would  simply  form 

Sqp  =  !l  +  Hp.  (3.5.2) 

In  the  large-scale  case,  H  is  not  stored  and  so  direct  multiplication  is  not  possible.  Fortu¬ 
nately,  we  have  considerable  freedom  in  the  definition  of  H  while  still  being  able  to  satisfy 
(3.5.1).  We  shall  use  this  freedom  to  make  the  computation  of  g^  easy. 

In  the  dense  SQP  method  of  NPSOL  [GMSWSGa],  an  important  computational  device 
has  been  to  work  with  a  transformed  Hessian  approximation  QTHQ  =  RTR,  where  Q  is 
the  nonsingular  matrix  that  triangularizes  the  working-set  matrix.  An  analogous  device  is 
essential  to  the  success  of  our  large-scale  algorithm.  We  define  the  transformation  matrix 
Q  to  be 

Q  —  (  Z(.r)  V(x)  )  ,  (3.5.3) 

where  Z  and  Y  satisfy  the  following  requirements: 

•  Z  is  a  basis  for  the  null  space  of  the  active  constraints  at  the  current  point; 
i.e.  A{x)Z(x)  =  0; 

•  Q  is  nonsingular. 

We  also  define 

(  Db  0  0  \ 

QT HQ  =  I  0  Hz  0  I  , 

\  0  0  dn  ) 

where  DB  and  DN  are  nonnegative  diagonal  matrices  each  having  zero  elements  on  its 
diagonals  corresponding  to  linear  or  slack  variables. 

Thus,  ZtHY  =  0  and  YTJIY  =  D,  and  the  gradient  for  the  QP  is  given  by 

fkr  =  U  +  Q-T(QTIIQ)Q-'p.  (3.5.4) 

Note  that  the  transformed  Hessian  approximation  QTHQ  is  known  and  is  simple,  but 
operations  with  Q~x  (rather  than  Q)  are  required  to  compute  g We  choose  the  null- 
space  basis  Z  to  have  the  reduced-gradient  form  (3.1.3)  but  we  have  some  freedom  in 
choosing  Y .  In  the  next  section  we  discuss  various  choices  for  Y .  Each  choice  gives  rise 
to  a  different  Q~l,  which  in  turn  affects  the  effort  required  to  compute  the  QP  gradient. 
The  merits  of  the  various  choices  are  then  compared. 


3.6  The  matrix  Y 
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3.6.  The  matrix  Y 

For  each  QP  subproblem  the  QP  Hessian  depends  only  on  Aq ,  the  working-set  matrix  at 
the  start  of  the  QP  subproblem.  Define 

A0s(B°  (3.6.1) 

The  null-space  basis  for  /4o  will  be  denoted  by  Zq  and  Y  for  /lo  will  be  denoted  by  Yo  with 
Qo  =  (  Vb  Z0  ).  The  definition  of  A  (i.e.  B,  S,  and  N),  Z,  and  Y  may  change  during 
the  solution  of  the  QP  subproblem,  but  the  matrix  H  remains  constant.  It  is  therefore 
necessary  to  remember  the  column  indices  in  Aq. 

In  this  section  we  define  three  possible  choices  for  Yo.  In  addition  to  requiring  Qo  to  be 
nonsingular  we  would  like  Qo  to  be  well-conditioned  and  operations  with  Qq  1  to  be  cheap. 
In  the  following  sections  the  subscripts  on  the  matrices  Yo.  Zo,  etc.  will  be  dropped  (e.g. 
Q  =  Qo)i  when  a  matrix  corresponds  to  a  QP  iteration  other  than  the  initial  one  it  will  be 
denoted  by  a  subscript  for  the  current  iteration  count  (e.g.  B  =  B(p o)  and  Bk  =  B(pk)). 


3.6.1.  Y  defined  using  a  partition  of  I 
Lemma  3.6.1.  When  Y  is  of  the  form 


(  I  °  \ 
1=  0  0 
\0  I) 


then  Q  is  nonsingular  and 


Q~l 


I  0 

I 

1° 


/  0  \ 
B~lS  0 
0  I  ) 


(3.6.2) 


(3.6.3) 


Proof.  Permutation  of  the  first  two  blocks  of  columns  from  Q  gives 


/  -B-'S  I 

(  7 


(3.6.4) 


which  shows  that  Q  is  nonsingular.  The  result  for  Q~x  may  be  shown  by  block  premulti¬ 
plication  of  (3.6.3)  by  Q.  | 

With  this  choice  of  Y  the  QP  gradient  becomes 


(Iqp  —  9  "b 


(pB  +  B  1 S  Ps ) 

StB-t(Pb  +  B-'Sps)  +  Hzps 

Pn 


(3.6.5) 


The  work  required  to  compute  consists  of  matrix-vector  products  with  S  and  5r, 
a  matrix-vector  product  with  //z,  and  two  solves  with  the  basis  (one  with  B  and  one  with 
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BT).  The  QP  gradient  can  be  obtained  as  follows: 


form 

w 

= 

Sps; 

solve 

Bv 

= 

w\ 

solve 

Btu 

= 

Pb  +  v\ 

form 

w 

= 

STu; 

form 

Q 

= 

B2ps\ 

(  <Jb\  I 

*  Pb  A-  v 

form 

(Jqp 

= 

*  M 

q  +  w 

V  9S  )  \ 

\  Pn 

(3.6.6) 

(3.6.7) 

(3.6.8) 

(3.6.9) 

(3.6.10) 

(3.6.11) 


We  may  simplify  the  computations  (3.6.6)-(3.6.10)  when  the  feasibility  phase  is  ter¬ 
minated  without  changing  the  working  set.  When  the  initial  p  is  feasible  with  respect  to 
its  bounds,  equations  (3.6.6)-(3.6.7)  can  be  omitted  from  the  computation. 

During  the  solution  of  a  subproblem  the  factors  LkUk  of  the  basis  Bk  change  as  the 
working  set  changes  (refer  to  Procedure  3.2.4,  for  example).  When  Y  is  of  the  form  (3.6.2) 
we  must  maintain  a  factorization  of  both  B  and  Bk  since  the  use  of  Q  requires  the  use  of 
Bq  1 .  To  avoid  having  to  store  two  separate  LU  factorizations,  one  for  B  and  one  for  Bk,  we 
could  obtain  Bk  from  Bk-i  using  a  classical  prodtict-form  update  [Orc68]  (or  alternatively 
a  blocfc-LU  update  as  described  in  Eldersveld  and  Saunders  [EldS90]).  This  would  allow 
us  to  perform  operations  with  both  B  and  Bk  with  no  increase  in  storage  over  that  for  just 
Bk .  However,  difficulty  arises  if  the  number  of  updates  to  B  becomes  large.  Normally,  Bk 
is  refactorized  every  50-150  minor  iterations.  Thus,  if  the  QP  solution  process  caused  B 
to  undergo  many  updates  it  would  still  be  necessary  to  store  two  factorizations. 


3.6.2.  An  orthogonal  Y 

Following  the  lead  established  by  using  the  orthogonal  TQ  factorization  for  the  dense  case, 
we  can  choose  Y  to  satisfy  ZTY  =  0.  Unlike  the  Y  obtained  from  the  TQ  factorization 
we  shall  not  require  Y*Y  =  /.  This  choice  ensures  that  Q  is  nonsingular  if  Y  has  full 
column  rank.  We  would  also  expect  Q  to  be  similarly  conditioned  to  Z  provided  Y  is  also 
similarly  conditioned  to  Z. 

There  are  many  possible  choices  for  an  orthogonal  Y.  An  obvious  choice  is  AT,  but 
such  a  choice  does  not  give  a  computationally  convenient  form  for  Q~x.  A  convenient 
choice  is  presented  in  the  following  lemma. 

Lemma  3.6.2.  When  Y  is  of  the  form 

-(4':)' 

then  Y  is  orthogonal  to  Z  and  Q  is  nonsingular,  with 

(  -stb~td  c  0  \ 

Q~x  =\  D  B~x SC  0  , 

V  0  0  l) 

where  C  =  (/  +  STB~T B~x S)~x  =  (ZTZ)~X  and  D  =  (I  +  B~x  SSTB~T)-X 


(3.6.12) 


(3.6.13) 
=  (y^K)-1. 
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Proof.  The  result  that  Z =  0  follows  easily  from  the  definition  of  Y  and  Z.  Permu¬ 
tation  of  Q  as  in  (3.6.4)  yields 


-B~XS 

I 


I  -B~lS  \ 

StB~t  I  I  .  (3.6.14) 


Performing  block  elimination  on  the  right-hand  side  of  (3.6.14)  reduces  the  permuted  Q 
to 

I  -B~XS  \ 


stb-tb~xs 


(3.6.15) 


which  is  nonsingular  since  STB~T B~x  S  is  positive  definite.  Hence  Q  is  nonsingular.  The 
result  for  Q~x  may  be  shown  by  block  premultiplication  of  (3.6.13)  by  Q.  | 

Clearly,  using  Q~l  directly  to  obtain  the  QP  gradient  does  not  look  promising.  Using 
(3.6.13)  to  form  matrix-vector  products  with  Q~l  would  require  separate  Cholesky  factor¬ 
izations  of  both  ZTZ  and  Y^Y .  While  we  would  expect  ZTZ  to  be  small  for  the  problems 
addressed  in  this  report,  the  same  cannot  be  said  for  YTY.  The  following  lemma  shows 
the  precise  form  of  (y7V)-1. 


Lemma  3.6.3.  When  Y  is  of  the  form  (3.6.12),  (Y'rY)  1  has  the  form 

( YtY)-x  =  (/  -  B~lSCSTB-T),  (3.6.16) 

where  C  =  (/  +  STB'TB-XS)~X  =  ( ZTZ)~X . 

Proof.  The  result  may  be  verified  by  forming  YTY  from  (3.6.12)  and  multiplying  the 
result  into  the  right-hand  side  of  (3.6.16).  | 

Use  of  (3.6.16)  allows  us  to  provide  a  more  convenient  form  of  Q~x  that  does  not 
require  a  factorization  of  Y^Y. 

Lemma  3.6.4.  When  Y  is  of  the  form  (3.6.12),  Q~x  has  the  form 

j  -CStB~t  C  0  \ 

Q~l  =  /  -  B-'SCStB~t  B-'SC  0  ,  (3.6.17) 

V  0  0  u 

where  C  =  (/  +  STB~T B~' S)~l  =  (ZTZ)~X. 


Proof.  The  result  for  Q~x  arises  by  the  substitution  of  (3.6.16)  into  (3.6.13).  | 

The  result  of  Lemma  3.6.4  allows  us  to  compute  the  subproblem  gradient  without 
maintaining  a  Cholesky  factorization  of  l'1)'.  To  compute  g gp  requires  four  matrix-vector 
products  with  S  or  ST,  four  solves  with  B  or  Br.  two  solves  with  C~x  =  ZTZ  and  one 
product  with  11  z\ 


Pb > 


ST tv; 


solve  Btw 

form  u 


(3.6.18) 

(3.6.19) 
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solve 

(ZTZ)y 

= 

Vs  ~ 

(3.6.20) 

form 

w 

= 

Sy ; 

(3.6.21) 

solve 

Bv 

= 

w; 

(3.6.22) 

solve 

Btw 

= 

vB  +  »; 

(3.6.23) 

form 

u 

= 

ST w] 

(3.6.24) 

form 

t 

= 

Hzy\ 

(3.6.25) 

solve 

( ZTZ)r 

= 

t  +  u; 

(3.6.26) 

form 

w 

= 

Sr; 

(3.6.27) 

solve 

Bq 

= 

w. 

(3.6.28) 

The  subproblem  gradient  may  then  be  written 

(<Jb  \  (  Pb  +  v~  9  \ 

9s  I  +  I  r  I  •  (3.6.29) 

As  when  Y  has  the  form  (3.6.2),  Q~ 1  is  defined  in  terms  of  Bq *.  Hence,  this  form  of  Y 
requires  two  separate  factorizations  of  the  basis  matrices  within  the  subproblem,  one  for 
the  initial  basis  B0  and  one  for  the  current  basis  £*. 


3.6.3.  Y  defined  using  B  1 

A  choice  for  Y  that  leads  to  a  very  simple  and  computationally  efficient  form  for  Q~l  is 
given  in  the  following  lemma. 

Lemma  3.6.5.  When  Y  is  of  the  form 

B~x  0  \ 

0  0  , 

0  I) 

Q  is  nonsingular,  and 

0  /  0  \ 

B  S  0 
0  0  7  / 

Proof.  Permutation  of  Q  as  in  (3.6.4)  yields 

-B~lS  B~'  \  /  I  \  /  S'1  -B~lS  \ 

,  /  ,).  (—) 

which  is  nonsingular  since  B  is.  Hence  Q  is  nonsingular.  The  result  for  Q-1  may  be  shown 
by  block  premultiplication  of  (3.6.31)  by  Q.  I 


(3.6.30) 


(3.6.31) 
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The  form  of  Q~ 1  allows  us  to  compute  the  QP  gradient  easily.  By  definition, 


ft jp  =  9  + 


=  9  + 


\-T 


Hz  0  0  \ 
0/0 
0  0/ 


Q~l } 


{  BT(BpB  +  Sps) 

ST(  BpB  +  Sps)  +  Hzps 

k  Pn 


(3.6.33) 

(3.6.34) 


To  compute  g^,  requires  three  matrix- vector  products,  one  with  Hz,  one  with  (  B  S  ), 
and  one  with  its  transpose: 


Since  Hz  is  small  by  comparison  to  (  B 
We  then  have 

(9b 
9s 
9n 


(3.6.35) 

(3.6.36) 

(3.6.37) 

S  ),  the  main  effort  is  in  computing  w  and  v. 


vB 

vs  +  u 

Pn 


It  is  important  to  note  that  Q~x  is  defined  in  terms  of  Bo  not  Bq  1 .  Operations  with 
Q~x  require  multiplications  with  Bq.  This  is  easily  accomplished  by  keeping  track  of  the 
indices  of  the  columns  from  A  that  make  up  Bo .  Hence,  a  benefit  of  this  choice  of  Y  is 
that  operations  with  Q~l  do  not  require  two  separate  factorizations  of  the  basis  matrices, 
in  contrast  to  the  previous  two  choices  of  Y . 

From  a  computational  standpoint,  if  we  restrict  ourselves  to  pricing  only  after  the 
feasibility  phase  of  the  QP  subproblem,  the  QP  gradient  can  often  be  computed  with 
about  half  the  effort  of  the  above  method.  This  is  because  the  matrix-vector  product  w  in 
(3.6.35)  can  be  rewritten  as  w  =  —  (c*  -  s*.)  when  the  initial  p  is  feasible  with  respect  to 
the  bounds  (as  is  usually  the  case  in  the  later  stages  as  we  near  optimality).  In  the  event 
that  the  initial  p  is  not  feasible,  w  may  be  calculated  as  w  =  —  (c*  -  Sk )  -  NpN.  This  is 
often  cheaper  than  (3.6.35)  because  pN  is  expected  to  contain  few  nonzero  elements. 


3.6.4.  A  comparison  of  the  three  choices  of  Y 

We  summarize  the  presentation  of  the  three  choices  of  Y  by  highlighting  the  computa¬ 
tional  effort  required  to  compute  the  QP  gradient.  The  first  column  of  Table  1  gives  the 
specific  choice  of  Y .  The  remaining  columns  represent  the  number  of  operations  (solves 
or  products)  with  the  matrices  B,  .$',  //z,  and  ZTZ  respectively.  It  may  be  seen  that  the 
first  and  third  choices  are  similar  in  their  computational  cost.  The  orthogonal  choice  of 
Y  requires  about  twice  the  effort  and  a  second  Cholesky  factorization.  Its  main  benefit 
is  that  the  resulting  Q  may  be  better  conditioned  than  with  the  other  two  choices.  A 
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Form  ofY 

B  mult. 

B  solve 

S  mult. 

Z1  Z  solve 

Identity 

2 

2 

1 

Orthogonal 

4 

4 

1 

2 

Inverse  B 

2 

2 

1 

Table  1.  Matrix  products  and  solves  required  to  compute  g gp  for  various  choices  of  Y . 

virtue  of  the  last  choice  of  Y  is  that  no  additional  factorization  is  required.  Moreover, 
multiplications  with  B  are  likely  to  require  less  effort  thatn  solves  with  J3,  since  the  LU 
factors  of  B  contain  at  least  as  many  nonzero  elements  as  in  B  itself.  It  is  this  definition 
of  Y  that  is  used  for  the  computational  tests  of  Section  5. 

3.7.  Updating  the  reduced  Hessian 

Changes  in  the  working  set  cause  changes  to  Z  in  the  following  situations: 

1.  A  hound  is  deleted  from  the  working  set  and  added  to  S. 

2.  A  variable  corresponding  to  a  column  in  S  hits  a  hound. 

3.  A  variable  corresponding  to  a  column  in  B  hits  a  hound. 

These  changes  to  Z  in  turn  cause  changes  to  the  reduced  Hessian  Hz.  To  ease  computation 
only  the  C'holesky  factor  of  the  reduced  Hessian  is  recurred  (instead  of  Hz).  This  section 
describes  how  the  Cholesky  factorization  flz  =  RTR  may  be  maintained.  The  updates  to 
R  are  the  same  as  in  MINOS  [MurS78]  except  in  the  first  case,  where  the  new  column  of 
R  is  not  open  to  choice. 

3.7.1.  Updates  to  Hz  arising  from  deletion  of  a  bound 

When  a  new  variable  is  added  to  the  superbasic  set,  the  reduced-gradient  form  of  Z  is 
unaltered  except  for  gaining  a  new  column:  Z  =  (  Z  z  j .  The  new  reduced  Hessian  is 
given  by 

*, -***-(**!  (3.7,, 

Let  R  denote  the  new  Cholesky  factor.  It  follows  from  (3.7.1)  that 


k  =  (  "  I  )  •  {3.7.2) 

Hence,  R  can  be  obtained  by  the  following  operations: 

form  u>  =  Hz;  (3.7.3) 

form  v  =  ZTw;  (3.7.4) 

solve  RTr  =  v;  (3.7.5) 

form  b  =  yJzTw  -  ||r|p.  (3.7.6) 
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Since  Hz  is  positive  definite,  6  is  well  defined. 

3.7.2.  Consequences  of  an  early  change  of  subspace 

In  our  discussion  of  Section  3.2  we  assumed  that  the  minimizer  would  be  found  on  the 
current  subspace  before  pricing.  This  is  not  always  practical  (for  some  of  the  same  reasons 
that  lead  us  to  consider  an  early-termination  strategy  for  subproblems).  Optimization 
algorithms  such  as  MINOS  may  terminate  the  minimization  on  a  subspace  when  the  norm 
of  the  reduced  gradient  is  smaller  than  a  dynamic  tolerance.  As  in  the  case  of  early 
termination,  this  may  help  to  avoid  unnecessary  computation  when  far  from  the  optimal 
active  set.  Unfortunately,  if  Hz  is  expanded  as  in  (3.7.1)  there  is  no  guarantee  that  the 
search  direction  resulting  from  the  solution  of  the  Newton  equations  will  provide  a  feasible 
descent  direction  for  the  QP  iteration.  This  assertion  is  reflected  in  Lemma  3.7.1. 

Lemma  3.7.1.  If  a  Lagrange  multiplier  estimate  arising  during  the  QP  subproblem  is 
used  to  delete  a  constraint  from  the  working  set,  the  search  direction  arising  from  the 
solution  of  the  Newton  equations 

RTRpz  =  - ZTg gp  (3.7.7) 

may  not  be  a  feasible  descent  direction  for  the  QP  stibproblem,  unless  ZTg ^  —  0  before  the 
constraint  is  deleted. 

Proof.  The  proof  is  given  in  [GilM79].  | 

Despite  this  result  it  may  still  be  worth  pricing  before  finding  a  minimizer.  In  the 
event  that  the  search  direction  is  not  a  feasible  direction  of  descent  we  can  simply  revert 
to  minimizing  on  the  current  (smaller)  subspace. 

The  use  of  multiple  pricing,  i.e.  choosing  more  than  one  variable  to  become  superbasic, 
causes  greater  difficulty,  since  it  is  a  combinatorial  problem  to  identify  the  subset  of 
variables  that  prevent  the  search  direction  from  being  feasible.  This  problem  does  not 
arise  in  MINOS  since  the  expanded  Cholesky  factor  of  the  reduced  Hessian  becomes 


Clearly,  the  new  search  direction  is  always  feasible  with  respect  to  the  bounds  on  the 
new  superbasic  variables.  A  strategy  of  expanding  the  reduced  Hessian  using  (3.7.8)  is 
acceptable  for  the  optimality-phase  algorithm  when  the  update  is  performed  outside  the 
subproblem.  The  strategy  is  unacceptable  inside  the  subproblem  since  the  relation 

RTR  =  ZtHZ  (3.7.9) 

would  no  longer  hold.  The  result  of  this  discrepancy  is  that  if  a  unit  step  is  taken  in  the 
QP  then  in  general  ZTg^,  ^  0  (as  would  be  the  case  if  equation  (3.7.9)  held).  Since  the 
reduced  Hessian  approximation  defines  the  full  QP  Hessian,  when  (3.7.9)  is  violated  we 
are  no  longer  solving  the  “correct”  subproblem.  When  this  occurs  it  is  necessary  to  either 
modify  the  definition  of  the  QP  Hessian  to  satisfy  (3.7.9)  or  introduce  new  iinesearch 
criteria  and  termination  conditions  for  the  subproblem  to  ensure  that  the  resulting  search 
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direction  is  a  descent  direction  for  the  merit  function.  In  the  computational  tests  of 
Section  5,  a  conservative  strategy  was  adopted  and  the  subspace  was  not  changed  until 
the  norm  of  the  reduced  gradient  (for  the  subproblem)  was  quite  small.  To  be  precise, 
minimization  on  the  subspace  was  continued  in  the  subproblem  until 

\\d  j/qpIIoo  <  vGIMU 

where  c  is  machine  precision. 

3.7.3.  Updates  to  Hz  arising  from  changes  to  5 

If  the  </-th  superbasic  variable  hits  a  bound,  the  new  null-space  matrix  is  obtained  by 
removing  the  the  9-th  column  from  S.  The  new  C'holesky  factor  is  updated  bv  applying 
plane  rotations  to  R  followed  by  the  removal  of  its  ry-th  row  and  column  (see  [GGMS74]). 

3.7.4.  Updates  to  Hz  arising  from  changes  to  B 

If  a  variable  corresponding  to  the  p-th  column  in  B  hits  a  bound,  the  updating  is  performed 
in  two  stages.  First,  the  basis  is  updated  by  replacing  the  p-th  column  in  B  with  the  9-th 
column  from  5,  where  q  is  chosen  to  keep  B  well-conditioned.  (The  procedure  requires  a 
solve  with  BT.)  Finally,  we  delete  the  chosen  column  from  5  as  described  in  Section  3.7.3. 

Let  B  denote  the  new  basis  and  2  the  corresponding  null-space  matrix.  We  shall  show 
that  2  has  the  form 

PZ  =  ZM .  (3.7.10) 

where  P  is  a  permutation  matrix  and  M  is  a  rank-one  modification  of  the  identity.  This 
expression  for  Z  enables  the  Cholesky  factor  of  ZTHZ  to  be  determined  easily  from  that 
of  ZTHZ. 

Let  a  denote  the  p-th  column  of  B,  and  suppose  it  is  exchanged  with  s,  the  q-th  column 
of  S.  Also,  let  BTr  =  ep  and  0  =  rTs.  As  in  [MurS78],  q  is  chosen  by  first  forming  y  =  STr 
and  to  some  extent  maximizing  the  “pivot,  element”  yq  —  0  =  rT$.  We  have  a  =  Bep,  and 
the  updated  basis  B  is  given  by 

B  =  B  +  (s  -  a)ep  =  (/  +  («-  a)rT)B,  (3.7.11) 

so  that 

B~l  =  B~\l  -4>(s-  a)rT),  (3.7.12) 

where  4>  =  1/0.  The  change  in  S  (before  deleting  the  q-th  column)  is 

S  =  S-ls-a)e  (3.7.13) 

Lemma  3.7.2.  When  the  basis  has  been  updated  as  in  (3.7.1 1)  and  (3.7.13),  then  2  is 
of  the  form  PZ  =  ZM ,  where 

M  =  /  +  eqvT  and  v  =  -<p(STr  +  eq).  (3.7.14) 
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Proof.  Define  Z  =  ZM.  The  top  mxns  submatrix  of  Z  after  swapping  the  9-th  column 
of  S  with  the  p-th  column  of  B  is 

B~l§  =  B~l(I  -  d>(s  -  a)rT)(S  -  (s  -  a)ej ) 

=  (J d’1  -  <j>B~l(s- a)rT)(S  -  se*  +  ae*) 

=  -  <frB~l srTS  +  <)>eprTS  -  <j>B~x  se*  +  (j>epe^ 

=  B~'s(j  -  <t>eq(rTS  +  eJ)J  +  <(>ep(rTS  +  ej) 

=  B~1SM-epvT. 


Clearly,  the  result  is  satisfied  for  all  rows  except  the  p-th: 


S  = 


v 


T 


which  should  be  ej  after  the  update.  To  make  the  update  complete  we  swap  the  p-th  row 
with  the  (m  +  g)-th  row  (which  is  e ^),  using  the  permutation  matrix 

Pp,m+q  =  (®1  •••  ^p— 1  €m+q  C p+1  ^m+q  —  l  ? p  ^m+g+1  •••  ^m+n,  )•  (3.7.15) 


Premultiplication  of  Z  by  Pp,m+q  gives 
Pp,m+qZ  —  Pp,m+q 


-B-'S 

I 

0 


=  z(/-^,(rTS  +  e[)). 


(3.7.16) 


The  permuted  update  satisfies  epZ  =  ej.  The  lower  ns  x  ns  portion  of  Z  is 


(3.7.17) 


which  is  the  identity  matrix  except  that  the  9-th  row  which  is  the  p-th  row  of  -B~lS. 
Finally,  Z  is  constructed  to  have  one  less  column  than  Z  by  having  its  g-th  column  removed 
and  its  (m  +  g)-th  row  zeroed  out  and  permuted  to  the  bottom.  | 


3.7.5.  Updating  the  Cholesky  factor  of  Hz 

The  changes  to  Z  must  be  reflected  in  updates  to  the  Cholesky  factor  of  Hz.  We  have 

Hz  =  MtHzM  =  ( Rr+  vut)(R+  uvt),  (3.7.18) 

where  u  =  Req  is  the  g-th  column  of  R  and  v  =  -<f>(STr  +  eq).  The  updated  factor  R  is 
obtained  by  reducing  R  +  uvT  to  upper  triangular  form  with  two  partial  sweeps  of  plane 
rotations  as  in  (GGMS74,  MurS78],  Finally,  because  a  superbasic  column  was  used  to 
replace  a  column  from  B ,  the  number  of  superbasic  variables  is  reduced,  and  a  row  and 
column  are  removed  from  flz  (and  its  Cholesky  factor)  as  described  in  Section  3.7.3. 


Chapter  4 

The  Quasi-Newton  Update  to  the 
Reduced  Hessian 


In  the  prototype  algorithm,  we  assume  that  the  QP  Hessian  H  approximates  W ,  the 
Hessian  of  the  Lagrangian.  The  relationship  between  W  and  H  and  computational  details 
of  updating  approximations  to  the  reduced  Hessian  of  the  Lagrangian  are  the  concerns  of 
this  section. 


4.1.  Introduction 


Quasi-Newton  methods  are  “Newton-like”  algorithms  in  which  the  Hessian  (or  classically, 
its  inverse)  is  replaced  by  an  approximation.  The  approximation  is  obtained  by  using  the 
known  curvature  along  the  directions  of  search.  Most  quasi-Newton  methods  are  based  on 
a  formula  from  the  one- parameter  family  of  updates  introduced  by  Broyden  [Bro67],  for 
unconstrained  optimization.  Define  0  €  (0, 1],  s  =  x  —  x  and  y  =  g  —  g.  The  one-parameter 
family  of  Broyden  updates  is  then 


H  =  H  - 


HsstH 
sTHs  + 


y*s 


T 


(4.1.1) 


where 


(4.1.2) 


The  choice  of  the  factor  0=1  gives  the  classical  DFP  formula,  while  the  choice  of  0  =  0 
gives  the  BFGS  formula.  At  least  in  one  sense,  the  choice  of  0  is  not  critical.  Dixon 
[Dix72a,  Dix72b]  showed  that  when  the  iterates  are  chosen  to  satisfy 


Hp  =  -5,  (4.1.3) 

x  =  x  +  ap ,  (4-1.4) 

where  a  is  the  step  that  minimizes  F(a)  =  F{x  +  ap),  the  Broyden  updates  generate 
identical  iterates  independent  of  the  choice  of  0. 

Despite  this  result,  the  performance  of  quasi-Newton  methods  does  depend  signifi¬ 
cantly  on  the  choice  of  0  because  it  is  inefficient  in  practice  to  perform  accurate  line- 
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searches.  Moreover,  even  the  effort  to  perform  the  linesearch  depends  on  4>.  For  un¬ 
constrained  problems,  <f>  =  0  (the  BFGS  update)  has  been  shown  to  be  a  good  choice 
[GMP72]. 

Recently  there  has  been  considerable  interest  in  the  rank-one  update  (see  [KBS90]  and 
[CCT91]).  Because  our  interest  is  in  exploring  the  differences  between  our  large-scale  SQP 
approach  (using  only  an  reduced  Hessian  approximation)  and  NPSOL,  we  did  not  wish  to 
make  use  of  a  different  update.  We  have  opted  to  use  the  BFGS  update  since  this  is  the 
one  used  in  NPSOL. 


4.1.1.  The  BFGS  update 

The  pedagogical  form  of  the  BFGS  update  for  unconstrained  optimization  is  given  by 


H  =  H  - 


HsstH  yy^_ 
sTHs  yTs' 


(4.1.5) 


If  H  is  positive  definite,  the  new  approximation  H  is  positive  definite  if  and  only  if  yTs  >  0. 
For  unconstrained  optimization,  we  can  always  ensure  yTs  >  0  by  choosing  appropriate 
termination  conditions  for  the  linesearch  (see  [GMSW79]).  For  nonlinearly  constrained 
optimization,  however,  yTs  >  0  is  no  longer  assured. 


4.2.  Quasi-Newton  updates  for  NLP 

A  number  of  authors  have  proposed  SQP  algorithms  for  NLP  using  H  as  a  quasi-Newton 
approximation  to  W .  This  idea  is  credited  to  Murray  who  proposed  it  in  [Mur69].  It 
has  been  used  with  success  by  others  (see  [Han76],  [Pow83],  or  [GMSW86b]  for  example). 
Complications  arise  because  of  the  constraints;  for  example: 

1.  The  BFGS  update  as  well  as  many  quasi-Newton  approximation  methods  are  depen¬ 
dent  on  the  approximation  H  being  positive  definite,  yet  it  is  not  necessary  (nor  is 
it  expected)  that  the  exact  Hessian  W  be  positive  definite,  even  at  the  solution. 

2.  Both  W  and  H  require  estimates  of  the  Lagrange  multipliers,  which  themselves  are 
nonlinear  functions  of  x. 

3.  We  can  no  longer  ensure  that  yTs  >  0. 

In  addition  to  these  difficulties,  large-scale  SQP  methods  that  approximate  the  full  Hessian 
of  the  Lagrangian  may  be  computationally  intractable  since  the  Broyden  updates  do  not 
preserve  the  sparsity  of  the  true  Hessian.  While  the  true  Hessian  may  be  quite  sparse, 
its  approximation  using  (4.1.1)  is  almost  always  completely  dense.  Thus,  as  problem  size 
grows,  the  storage  and  effort  required  to  perform  the  update  may  become  enormous. 

In  the  case  of  unconstrained  optimization,  attempts  have  been  made  to  exploit  spar¬ 
sity  in  quasi-Newton  methods  for  problems  whose  Hessian  has  a  known  sparsity  pattern. 
Unfortunately,  results  for  these  methods  have  not  been  promising  (see  [Tha83]  and  [ShaBO] 
for  example). 

To  preserve  sparsity  and  still  satisfy  the  quasi-Newton  condition  (see  Section  4.2.2)  a 
significant  amount  of  time  may  be  required  to  perform  the  linear  algebra  defining  a  suitable 
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sparse  update.  In  addition,  a  positive-definite  H  cannot  be  guaranteed  and  superlinear 
convergence  is  not  in  general  achieved. 

To  preserve  sparsity  and  still  satisfy  positive  definiteness  in  H,  it  is  possible  to  construct 
an  update  in  which  the  fill-in  (according  to  the  sparsity  pattern  of  the  true  Hessian)  is 
ignored.  Unfortunately,  satisfaction  of  the  quasi-Newton  condition  cannot  be  guaranteed 
and  superlinear  convergence  is  not  in  general  achieved. 

Recently,  Nickle  and  Tolle  [NicT89]  have  attempted  to  use  sparsity-exploiting  quasi- 
Newton  updates  within  an  SQP  method  for  constrained  problems.  They  use  the  BFGS  up¬ 
date  for  the  approximation  to  the  full  Hessian  of  the  Lagrangian  and  maintain  a  Cholesky 
factorization  of  the  approximation  that  ignores  the  fill-in  associated  with  the  standard 
update.  In  their  implementation,  they  do  not  enforce  quasi-Newton  condition.  The  suc¬ 
cess  of  this  approach  for  large-scale  NLP  has  not  been  verified  as  the  method  has  only 
been  tested  on  small  problems  (the  largest  problem  in  their  test  set  having  60  variables, 
40  linear  constraints  and  10  nonlinear  constraints). 


4.2.1.  An  approximation  to  the  reduced  Hessian 

The  poor  performance  of  late  of  sparsity-exploiting  methods  for  approximating  the  full 
Hessian  leads  us  to  explore  alternatives  for  large-scale  NLP. 

Consider  the  use  of  a  quasi-Newton  approximation  to  the  reduced  Hessian  ZTWZ. 
This  approach  for  NLP,  proposed  by  Murray  and  Wright  [MurW78],  takes  advantage 
of  the  property  that  Z*T\V*  Z*  is  positive  semidefinite.  This  result  combined  with  the 
computational  expense  of  approximating  all  of  W  makes  it  unreasonable  to  use  a  positive- 
definite  approximation  to  W  for  large-scale  problems. 

As  discussed  in  Section  2,  to  ensure  two-step  superlinear  convergence  of  the  prototype 
algorithm  we  have  assumed  that  our  approximation  to  the  Hessian  is  accurate  on  the  null 
space  of  the  active  constraints.  Specifically,  we  assume  the  approximation  to  the  reduced 
Hessian  satisfies 

II Zl(Hk  -  Wk)ZkpZk\\  =  o(\\Pk\\),  (4.2.1) 

where  Wk  denotes  the  Hessian  of  the  Lagrangian  at  xk.  Although  (4.2.1)  cannot  be  verified 
computationally,  our  goal  is  to  approximate  this  condition  by  using  a  positive-definite 
approximation  to  the  reduced  Hessian  at  the  initial  point,  followed  by  a  quasi-Newton 
update  to  the  approximation  at  the  end  of  each  ma  jor  iteration.  The  update  must  satisfy 
the  following  minimal  requirements: 

•  The  new  reduced  Hessian  approximation,  Hz  =  ZTH Z ,  must  be  positive  definite. 

•  The  “reduced”  quasi-Newton  condition  must  be  satisfied  (see  next  section). 

In  addition,  there  may  be  special  cases  to  be  considered.  For  example,  when  all  of  the  NLP 
constraints  are  linear,  Lagrange  multiplier  estimates  are  not  required  to  update  the  Hessian 
approximation.  This  is  not  generally  the  case  for  nonlinearly  constrained  problems  but 
curiously,  one  form  of  the  BFGS  update  for  the  reduced  Hessian  does  not  require  Lagrange 
multiplier  estimates,  which  clearly  circumvents  any  difficulties  arising  from  such  estimates 
being  poor.  The  precise  form  of  the  updates  are  discussed  in  the  following  sections. 


4.2  Quasi-Newton  updates  for  NLP 


47 


4.2.2.  The  quasi-Newton  condition 

In  the  unconstrained  case,  the  quasi-Newton  condition  may  be  written  as 

Hs  =  y,  (4.2.2) 

where  s  =  x—x  and  y  =  g—g.  Likewise,  for  the  linearly  constrained  case,  the  quasi-Newton 
condition  for  the  reduced  Hessian  is 


Hzsz  =  Vz,  (4.2.3) 

where  s2  =  ZT(x  -  x)  and  yz  =  ZT(g  -  g ).  Note  that  no  additional  gradients  are  necessary 
to  compute  yz. 

It  can  be  shown  that  the  reduced  Hessian  obtained  in  the  linearly  constrained  case  is 
identical  to  the  matrix  obtained  by  updating  the  full  matrix  and  then  forming  the  reduc¬ 
tion.  Unfortunately,  this  property  is  no  longer  true  for  the  case  of  nonlinear  constraints. 
The  difference  is  due  to  the  fact  that  the  step  s  taken  is  not  of  the  form  s  =  Zsz .  It  is  no 
longer  clear  what  the  definition  of  sz  and  yz  should  be.  An  approach  adopted  by  Coleman 
and  Conn  [ColC84]  is  to  solve  Hzsz  =  -ZTg,  evaluate  g(x  +  Zsz)  and  then  form  the 
“correct”  y  corresponding  to  this  intermediate  step.  This  requires  an  additional  gradient 
evaluation  per  iteration.  Other  candidates  for  s  and  y  include 


s  — 

ZT(x  ■ 

- 

or 

(4.2.4) 

s  = 

ZT(x  ■ 

-  *), 

or 

(4.2.5) 

s  = 

ZTx  - 

ZTx , 

and 

(4.2.6) 

y  = 

ZT(g- 

-ffK  or 

(4.2.7) 

y  = 

ZT(g- 

-  g  + ' 

\T\),  or 

(4.2.8) 

y  = 

ZTg- 

ZTg. 

(4.2.9) 

The  motivation  for  these  choices  for  s  are  straightforward.  For  y ,  note  that  (4.2.7)— (4.2.8) 
are  transformations  of  yL  =  VxL(x,  A)  -  VxL(x,X)  under  either  Z  or  Z.  Equation  (4.2.9) 
presents  a  practical  choice  of  y  that  does  not  need  multiplier  estimates.  An  additional 
candidate  for  s  is  analogous  to  the  one  in  MINOS  [MurS87],  namely 


,s  =  aps,  (4.2.10) 

where  a  is  the  steplength  from  the  linesearch  (which  is  used  each  minor  iteration  to  reduce 
the  augmented  Lagrangian  objective).  For  the  prototype  SQP  algorithm,  a  is  taken  as 
the  steplength  from  the  merit  function  linesearch  (performed  each  major  iteration).  The 
motivation  for  this  choice  of  s  arises  from  the  fact  that  when  the  correct  active  set  has  been 
identified,  each  QP  subproblem  is  solved  in  a  single  iteration.  The  superbasic  component 
of  the  search  direction  is  found  by  solving 

HzPs  ~  ~ZTgQp,  (4.2.11) 

where  g^,  — *  g  as  p  — *  0  near  the  solution.  Prior  to  the  computational  tests  described  in 
Section  5  each  of  these  cases  for  y  and  s  was  tested.  A  decision  based  on  the  tests  was 
then  made  to  use  (4.2.10)  for  s  and  (4.2.9)  for  y. 
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Because  NLP  is  an  inequality  constrained  problem,  the  number  of  super  basic  variables 
usually  changes  between  major  iterations.  Thus,  Zk  and  Zk+  \  (and  hence  Hz  and  Hz)  may 
not  be  the  same  size.  It  would  seem  that  s  and  y  may  not  be  well  defined.  Fortunately, 
this  is  not  a  difficulty.  During  the  solution  of  the  QP  subproblem  the  basis  Bk,  the  reduced 
Hessian  Hz  and  the  index  set  of  superbasic  variables  are  updated  each  minor  iteration. 
As  a  result  the  set  of  superbasics  and  the  size  of  the  reduced  Hessian  are  the  same  at 
the  completion  of  the  last  major  iteration  as  at  the  start  of  the  next  major  iteration. 
The  quasi- Newton  update  takes  place  at  the  start  of  the  new  major  iteration  and  can  be 
completed  without  excessive  expense.  The  key  idea  is  to  ensure  that  the  reduced  gradient 
ZTg  used  in  the  update  corresponds  to  the  Z  at  the  end  of  the  last  major  iteration. 


4.3.  Modifications  to  the  BFGS  update 

At  times  it  may  not  be  possible  to  use  the  standard  BFGS  update  for  the  reduced  Hessian 
without  encountering  numerical  difficulties.  Subsequent  sections  discuss  modifications  to 
the  BFGS  update  for  the  following  cases: 

•  When  yTs  <  0. 

•  When  ||*||  is  “small”. 

•  When  yTs  >  0  but  yTs  is  “small”. 


4.3.1.  When  yTs  <  0 

Normally,  when  yTs  <  0  it  is  necessary  to  skip  the  BFGS  update  to  avoid  indefiniteness  or 
singularity  in  the  new  approximation.  One  modification  of  the  BFGS  update  for  the  full 
Hessian  is  due  to  Powell  [Pow78a]  and  attempts  to  perform  the  update  under  conditions 
when  yTs  <  0.  Let  i /  €  (0, 1].  The  Powell  modification  defines  the  new  approximation  as 


I1ssTlI  ,  ddT 
Tils  +  sTd ’ 


(4.3.1) 


where  d  =  0y  +  ( 1  —  0) Hs  and 


0  = 


( 


1 

(1  -  g)sTII s/[sTII s  —  yTs) 


if  yT$  >  ijstHs, 
otherwise. 


(4.3.2) 


This  modification  ensures  that  11  is  positive  definite  and  the  determinant  of  H  is  no  less 
than  »/  times  the  determinant  of  //.  For  computational  purposes,  Powell  sets  g  =  0.2. 

The  priorities  are  different  when  updating  an  approximation  to  the  reduced  Hessian. 
It  may  be  that  the  search  direction  lies  (almost)  entirely  in  the  range  space  of  A.  In  such 
circumstances  there  is  no  point  in  updating  Hz  (since  there  is  no  new  information).  This 
idea  is  reinforced  by  the  results  of  Coleman  and  Fenyes  ([C0IF88],  page  11,  Corollary  3.6), 
who  give  updates  for  approximations  to  ZT\VZ  and  ZT\VY  and  show  there  instances  when 
yTs  <  0  leads  to  updating  only  the  approximation  to  YTHY  (while  skipping  the  update 
of  ZTII Z).  For  this  reason  the  Powell  update  was  not  adopted  and  updates  were  skipped 
when  yTs  <  0. 


4.3  Modifications  to  the  BFGS  update 
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4.3.2.  When  ||«||  is  small 

A  further  problem  with  the  standard  BFGS  update  arises  when  ||s||  <  ||y||.  If  this  occurs, 
the  modification  of  Powell  (4.3.1)-(4.3.2)  may  not  help.  To  see  why,  note  that  it  is  possible 
to  construct  examples  where  ||s||  <  ||y||  and  yTs  >  sTHs.  Hence,  the  update  using  (4.3.2) 
may  make  the  new  H  nearly  singular. 

As  discussed  in  [Gur87],  the  method  of  Coleman  and  Conn  [ColC84]  may  be  interpreted 
as  a  method  that  prevents  these  difficulties  when  ||s||  is  small.  They  define  an  intermediate 
point  x  =  x  +  Zpz,  obtained  by  first  solving  Hzpz  =  -ZTg.  They  use  the  standard  BFGS 
update  (4.1.5)  with  an  orthogonal  Z  and  with  s  and  y  defined  as 


ZT(x  -  x) 

(4.3.3) 

- Hz-xZrg , 

(4.3.4) 

Zr(VrZ(i,A)-  VxL(x,A)) 

(4.3.5) 

ZT(g  -  ATX)  -  ZTg , 

(4.3.6) 

which  ensures  that  ||s||  is  not  too  much  smaller  than  ||y||.  Using  this  method  they  were 
able  to  prove  two-step  superlinear  convergence  of  an  SQP  algorithm.  As  pointed  out  in 
Section  4.2.2,  this  approach  requires  an  additional  set  of  gradient  evaluations. 

Nocedal  and  Overton  [Noc085]  have  used  an  alternative  strategy  that  does  not  require 
extra  gradients.  Their  method  is  based  on  skipping  the  update  when  ||s||  is  small  relative 
to  a  range-space  projection  of  (x  —  x ).  They  define  v  =  YT(x  —  x)  and  perform  the  update 
(4.1.5)  only  if 

IMI  <  (u?/(fc  +  l)1+l/)||s||,  (4.3.7) 

where  u;  and  i /  are  positive  constants  and  k  corresponds  to  the  current  iteration.  The 
authors  show  two-step  superlinear  convergence  of  an  SQP  algorithm  under  various  con¬ 
ditions.  For  their  computational  tests  they  used  u  =  1  and  u  —  0.01.  This  strategy  was 
not  adopted  for  the  implementation  of  the  prototype  SQP  algorithm.  Instead,  we  use  a 
self-scaled  update,  as  described  next. 


4.3.3.  The  self-scaled  BFGS  update 

When  yTs  is  “small”  we  could  skip  the  update  to  prevent  the  updated  approximation  from 
being  close  to  singularity.  This  is  not  an  ideal  strategy  since  some  change  in  the  curvature 
of  the  Lagrangian  generally  takes  place.  To  avoid  skipping  the  update  we  may  employ  the 
self-scaled  quasi-Newton  update  [Ore74].  This  is  exactly  the  same  as  the  standard  BFGS 
update  except  that  the  current  approximation  is  scaled  by  a  dynamic  factor  7: 


H  =  7//  - 


HsstH 
7  sTfls 


+ 


T 

yy 

yMs 


(4.3.8) 


As  described  in  Brodlie  [Bro77],  the  self-scaled  update  exhibits  the  property  of  a  mono- 
tonically  decreasing  condition  number  for  //  provided  the  scaling  factor  7  is  chosen  to 
satisfy 


1  —  (i 


T 

y s 
sTH  s 


+  (!-/*) 


yTH 


-1 


y 


yTs 


(4.3.9) 
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for  l3  6  [0, 1].  If  =  1  then  7  =  yTs/sTHs.  With  this  choice  of  7,  (4.3.8)  has  the  property 
of  correcting  the  curvature  along  s  before  performing  the  update  as  well  as  after  the  update: 

sr(7 H)s  =  ( yTs/sTHs)sTHs  =  yTs  and  (4.3.10) 

sTIIs  =  yTs.  (4.3.11) 

Other  choices  for  d>  and  7  have  been  studied  by  Brodlie  [Bro77]  to  maintain  well-conditioned 
approximations  to  the  Hessian.  Oren  and  Spedicato  [OreS76]  study  different  choices  for 
<j>  and  7  and  give  optimal  choices  that  minimize  a  sharp  bound  on  the  condition  number 
of  the  inverse  Hessian  approximation  at  each  iteration.  For  the  computational  tests  de¬ 
scribed  in  Section  5  we  use  the  standard  BFGS  method  (4.1.5)  for  the  reduced  Hessian 
when 

VTP  >  ( 1  -  n)pTHp.  (4.3.12) 

where  y  and  s  are  defined  by  (4.2.9)  and  (4.2.10)  respectively,  and  g  is  a  linesearch  param¬ 
eter  (typically  g  =  0.9).  When  (4.3.12)  is  not  satisfied  but  yTs  is  positive,  the  self-scaled 
BFGS  update  (4.3.8)  is  performed  with  7  =  yTs/sTHs. 


4.3.4.  Updating  the  Cholesky  factors  of  the  reduced  Hessian 

In  classical  implementations  of  quasi-Newton  methods,  the  inverse  approximation  H~l 
was  updated  at  each  iteration.  Although  convenient,  this  technique  may  create  serious 
numerical  difficulties.  For  example,  due  to  rounding  error  in  just  one  iteration,  all  subse¬ 
quent  approximations  to  the  inverse  Hessian  may  be  indefinite.  Unfortunately,  it  is  not 
generally  possible  to  determine  if  the  approximation  is  singular  or  indefinite  by  a  simple 
examination  of  the  matrix  itself. 

In  the  prototype  algorithm  the  system  of  equations  Hzpz  =  —  ZTg  must  be  solved  to 
determine  the  superbasic  search  direction.  A  reliable  and  convenient  approach  is  based 
on  using  the  Cholesky  factors  of  Hz ,  as  developed  by  Gill  and  Murray  [GilM72]  for  un¬ 
constrained  optimization.  The  method  has  the  advantage  of  being  able  to  detect  easily  a 
singular  or  indefinite  approximation  to  the  Hessian.  In  addition,  there  is  no  penalty  for 
maintaining  an  approximation  to  the  Hessian  rather  than  its  inverse.  Let  Hz  =  RTR.  To 
obtain  the  search  direction  we  solve  RTir  =  -ZTg  and  Rpz  =  w  using  forward  and  back 
substitution. 

For  the  computational  tests  of  Section  5  we  have  implemented  the  prototype  algorithm 
using  updates  to  the  Cholesky  factors  of  the  reduced  Hessian.  As  described  in  Dennis  and 
Schnabel  [DenS83]  for  unconstra:ned  optimization,  the  update  to  Hz  (4.1.5)  is  a  rank-two 
modification  to  RTR,  reflected  in  a  rank-one  update  to  R  itself: 


where 


r  =  R  +  iaRsKy  ~  qHzs)T 
yTs 


(4.3.13) 


(4.3.14) 


The  factor  R  is  returned  to  upper  triangular  form  using  plane  rotations  as  described  in 
[GGMS7-I]. 
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4.3.5.  The  update  for  self-scaled  factors 

When  the  self-scaled  BFGS  update  (4.3.8)  is  used,  the  update  to  the  Cholesky  factors  has 
the  form 

R  =  „R+(aRs)iy-a’,H*,)T, 

yJs 

where  i)  =  a  and  a  is  given  by  (4.3.14). 


(4.3.15) 


Chapter  5 

Computational  Results 


In  this  chapter  numerical  results  obtained  from  a  sparsity-exploiting  implementation  of 
the  prototype  SQP  algorithm  (Algorithm  2.3.1)  are  given.  The  tests  consist  of  solving 
two  sets  of  test  problems  using  the  new  algorithm  and  comparing  the  results  with  those  of 
MINOS  and  NPSOL.  The  first  set  of  problems  are  from  the  literature  and  are  all  dense  and 
relatively  small.  The  second  set  of  problems  are  sparse  optimal  control  problems.  The 
purpose  of  the  testing  is  to  demonstrate  the  large-scale  SQP  algorithm’s  strengths  and 
weaknesses. 

5.1.  Implementation 

The  implementation,  hereafter  referred  to  as  LSSQP,  has  been  written  as  a  major  modifi¬ 
cation  of  the  mathematical  programming  system  MINOS.  For  a  description  of  MINOS  see 
(MurS78,  MurS82]  and  [MurS87]. 

Many  features  and  modules  of  the  Fortran  code  in  MINOS  were  used  in  LSSQP,  includ¬ 
ing- 

•  MPS  data  handling  and  hashing  routines.  These  routines  are  used  to  read  in  the 
data  corresponding  to  constraint  rows  and  columns,  coefficients  for  linear  constraints 
(if  any),  right-hand  side  values  for  constraints,  and  bounds  and  initial  values  for  the 
variables. 

•  Basis-handling  routines  for  factorizing  and  updating  the  LU  factors  of  the  basis  B 
(routines  from  the  LUSOL  package  as  described  in  [GMSW87]). 

•  The  pricing  routine  used  to  calculate  reduced  costs  for  nonbasic  variables. 

•  The  linear  search  routine  used  to  calculate  the  step  to  the  nearest  bound  in  the  QP 
subproblem. 

•  The  nonlinear  search  routine  used  to  find  a  steplength  a  that  sufficiently  reduces 
the  augmented  Lagrangian  merit  function. 

•  Routines  for  updating  the  Cholesky  factors  of  the  reduced  Hessian  following  changes 
in  the  working  set. 
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Because  LSSQP  and  MINOS  share  many  of  the  same  routines,  the  differences  in  the  com¬ 
putational  performance  of  the  two  methods  are  due  to  the  basic  approach.  The  main 
features  of  LSSQP  and  some  of  the  differences  between  SQP  methods  (LSSQP  and  NPSOL) 
and  MINOS  are  discussed  in  the  following  sections. 

5.1.1.  Form  of  the  subproblem 

For  problems  with  nonlinear  constraints,  MINOS  uses  a  sequential  linearly  constrained 
(SLC)  method  and  evaluates  the  functions  and  gradients  during  each  minor  iteration. 
NPSOL  and  LSSQP  solve  quadratic  programming  subproblems  (which  have  the  same  lin¬ 
earized  constraints)  but  evaluate  nonlinear  functions  and  gradients  only  after  termination 
of  a  subproblem  (as  part  of  each  major  iteration). 

SLC  and  SQP  methods  deal  with  the  constraint  linearizations  the  same  way.  The 
subproblems  differ  in  their  objective  functions.  In  general,  evaluating  a  QP  objective 
should  be  less  expensive  than  evaluating  a  general  nonlinear  objective.  This  should  give 
an  advantage  to  an  SQP  method  over  a  Lagrangian  method.  However,  for  some  simple 
problems  this  advantage  may  be  negated  by  the  expense  of  evaluating  the  QP  gradient. 
For  SQP  methods,  most  of  the  expense  associated  with  the  QP  gradient  comes  from  the 
matrix-vector  multiplication  with  the  QP  Hessian  (i.e.  forming  Hp).  While  NPSOL  recurs 
an  approximation  to  the  full  Hessian  of  the  Lagrangian  (so  the  matrix- vector  product  is 
straightforward),  LSSQP  maintains  an  approximation  on  a  subspace  and  therefore  requires 
a  more  complex  approach  to  obtain  the  full  QP  gradient  (see  Section  3.5). 

Since  the  subproblem  solved  in  MINOS  has  a  general  nonlinear  objective,  one  hypoth¬ 
esis  is  that  the  number  of  iterations  required  to  solve  a  subproblem  is  likely  to  be  greater 
than  the  number  required  by  NPSOL  or  LSSQP. 


5.1.2.  Quasi-Newton  updates 

In  MINOS,  the  gradient  of  the  objective  is  evaluated  each  minor  iteration.  With  this 
information  available,  a  quasi-Newton  update  to  the  reduced  Hessian  is  performed  each 
minor  iteration.  In  SQP  methods,  the  gradient  of  the  objective  is  evaluated  before  the 
beginning  of  a  major  iteration.  As  a  result  ,  a  quasi-Newton  update  to  the  reduced  Hessian 
is  performed  only  once  per  major  iteration.  With  fewer  opportunities  to  update  the 
reduced  Hessian,  a  hypothesis  is  that  LSSQP  and  NPSOL  may  require  more  major  iterations 
than  MINOS.  As  already  noted,  we  may  expect  SQP  methods  to  require  fewer  minor 
iterations  per  major  iteration  than  methods  such  as  MINOS,  but  it  is  not  clear  which 
approach  is  likely  to  require  fewer  total  minor  iterations. 


5.1.3.  Basis  refactorization 

The  initialization  of  a  major  iteration  consists  of  setting  up  the  subproblem  to  be  solved. 
Part  of  this  process  involves  linearization  of  the  nonlinear  constraints  (if  any).  A  basis  B 
is  then  formed  and  factorized.  The  cumulative  effort,  to  factorize  the  basis  at  the  start  of 
each  major  iteration  may  constitute  a  large  percentage  of  the  overall  expense  of  solving 
NLP.  Because  of  this,  methods  requiring  few  total  minor  iterations  may  not  be  efficient  if 
they  require  many  major  iterations. 
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SQP  methods  such  as  LSSQP  should  prove  to  he  more  efficient  on  large  sparse  problems 
if  the  average  effort  to  factorize  the  basis  (for  each  subproblem)  is  small  compared  to  the 
average  effort  to  evaluate  functions  and  gradients  during  solution  of  the  subproblem. 

5.1.4.  Termination  conditions 

Let  £Cpt  be  an  optimality  tolerance  and  6fea  be  the  feasibility  tolerance  for  nonlinear 
constraints  (typically,  £0pt  =  10-6  and  £fea  =  10-6).  Let  p  be  the  search  direction  obtained 
from  the  last  QP  subproblem.  In  all,  five  criteria  must  be  satisfied  if  a  point  x  is  to  be 
considered  optimal  for  NLP: 


< 

^opt  (i  +  INI), 

(5.1.1) 

ll(<-*)||oo 

< 

^fea* 

(5.1.2) 
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^opt  ( 1  +  |E| ), 

(5.1.3) 
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(5.1.4) 

sign(ar>  -  lj)gj 

< 

+^opt  (1  +  max(|F|, Halloo), 

j  nonbasic. 

(5.1.5) 

where  g  is  the  gradient  of  F,  gN  is  the  subproblem  gradient  of  the  nonbasic  variables  and 

Vj  —  (  OqP  )j  ~  ft  (lJ  • 

The  termination  conditions  (5.1.1 )-( 5.1.3)  are  similar  to  those  in  NPSOL  [GMSW86a], 
Conditions  (5.1.4 )— (5.1.5)  are  evaluated  only  at  feasible  points  for  the  subproblem  and 
correspond  to  having  no  nonbasic  variable  with  a  nonoptimal  Lagrange  multiplier  (of  a  sig¬ 
nificant  magnitude).  This  condition  is  tested  as  a  by-product  of  pricing  (Procedure  3.2.1). 
If  a  minimizer  has  been  reached  on  the  current  subspace  and  the  pricing  procedure  finds 
no  nonbasic  variable  with  a  suitable  reduced  cost,  the  current  p  is  a  solution  for  the  sub¬ 
problem.  Note  that  when  a  subproblem  is  terminated  early,  these  conditions  will  not  be 
satisfied.  Also,  a  dynamic  tolerance  (which  is  gradually  reduced  to  £0pt)  is  optionally 
used  to  check  for  optimal  multipliers.  Early  subproblems  may  therefore  be  solved  to  only 
moderate  accuracy.  Termination  condition  (5.1.3)  differs  from  (5.1.4)— (5.1.5)  in  that  it 
checks  whether  the  norm  of  the  reduced  gradient  for  NLP  is  small  enough. 

For  the  experimental  results  we  defined  the  optimality  tolerance  60pt  =  10-6  for  all 
problems  unless  specified  in  the  “ Comment "  section  of  Tables  7  and  8.  The  termination 
conditions  for  MINOS  are  described  in  [MurS87].  In  general,  all  methods  (NPSOL,  MINOS, 
and  LSSQP)  obtained  solutions  to  similar  accuracy  on  all  test  problems  solved.  On  many 
problems  the  convergence  is  fast,  so  that  small  differences  in  the  accuracy  of  the  methods 
are  not  significant. 

5.1.5.  Testing  Environments 

Section  5.3  gives  numerical  results  for  92  test  problems  described  in  Section  5.2.  These 
problems  are  sorted  into  two  test  sets.  The  first  set  (of  smaller  problems)  was  tested  using 
a  Digital  Equipment  Corporation  VAXstation  II  with  9  megabytes  of  main  memory.  The 
operating  system  was  VAX/VMS  version  5.0.  All  Fortran  files  were  compiled  under  VAX 
FORTRAN  version  5.2-015  using  the  default  options,  including  code  optimization. 

The  second  set  (of  larger  problems)  was  tested  using  a  Digital  Equipment  Corporation 
DECstation  3100  with  24  megabytes  of  main  memory.  The  operating  system  was  Ultrix 
version  4.1.  Fortran  files  were  compiled  under  full  optimization  for  these  tests. 
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5.2.  Test  problems 

The  test  problems  used  in  this  section  come  from  a  variety  of  applications.  There  are  two 
test  problem  sets  described  next. 

5.2.1.  The  small  test  problems 

The  first  set  of  test  problems  consists  of  80  small  problems  (n  <  100),  whose  names  and 
statistics  are  given  in  Tables  7  and  8. 

The  first  three  columns  give  the  problem  number,  problem  name  and  comments  such 
as  the  type  of  application,  author,  or  alternative  name  of  the  test  problem.  Comments 
may  also  allude  to  a  different  starting  point  from  one  given  in  the  literature  or  special 
features  of  the  MINOS  options  file.  Columns  4-7  give  the  number  of  variables  for  the 
problem,  the  number  of  linear  constraints,  and  the  number  of  nonlinear  constraints.  The 
final  column  gives  the  optimal  (published)  objective  value. 

Unless  noted  in  the  Comment  section,  the  Jacobian  for  these  problems  is  treated  in 
a  dense  manner  by  MINOS  and  LSSQP.  (NPSOL  treats  all  problems  as  dense.)  Likewise, 
unless  noted,  the  initial  starting  points  for  the  problems  are  the  ones  given  in  the  published 
references. 

These  problems  have  been  used  to  test  the  mathematical  programming  code  NPSOL 
[GMSW86a]  and  many  are  considered  to  be  difficult  to  solve.  Several  of  the  problems  do 
not  satisfy  the  assumptions  in  Section  2  that  were  used  in  the  proof  of  convergence  for 
the  prototype  algorithm.  For  example,  in  some  cases  the  Jacobian  at  the  solution  has  less 
than  full  row  rank.  Other  problems  do  not  satisfy  the  strict  complementarity  conditions 
or  have  infeasible  subproblems.  The  test  problems  from  the  small  set  are  taken  from  the 
following  sources: 

•  Problem  1  is  the  sample  test  problem  distributed  with  NPSOL.  It  is  described  in 
[GMSVV86a]. 

•  Problems  2,  6-9,  39-41  and  45-79  have  the  prefix  HS.  They  correspond  to  the  same 
numbered  problems  from  Hock  and  Schittkowski  [HocS81].  The  upper  bound  for  X3 
in  problem  number  52  (HS  70)  was  changed  from  1.0  to  0.9999,  as  otherwise  MINOS 
would  evaluate  the  objective  at  a  singularity.  The  modification  does  not  change  the 
optimal  solution. 

•  Problems  3  and  10  have  the  prefix  AS.  They  correspond  to  the  same  numbered 
problems  from  Schittkowski  [Sch87]. 

•  Problems  4-5  and  16  are  described  in  [MurS82].  Problems  4-5  correspond  to  the 
problems  Wright  No.  4  and  Wright  No.  9.  The  starting  points  for  these  two  problems 
are  from  point  (d)  of  the  reference. 

•  Problems  11-15,  17,  32-38,  42-44  and  80  are  from  Prieto  [Pri89].  Problem  36  is 
solved  again  as  problem  37  with  an  alternate  starting  point  of  xq  =  (0.097,0.063). 

•  Problems  18-21  are  from  Fraley  [FraSS]. 

•  Problems  22-31  are  from  Boggs  and  Tolle  [BogT84]. 
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It  should  be  noted  that  the  functions  and  gradients  for  the  small  problems  are  usually 
very  cheap  to  evaluate.  Many  of  the  problems  have  been  chosen  to  be  test  problems 
for  precisely  this  reason,  but  historically  it  has  been  assumed  that  the  efficiency  of  an 
algorithm  is  measured  by  the  number  of  function  and  gradient  evaluations  required  to 
find  a  minimizer.  We  will  be  concerned  with  these  measures  of  efficiency  as  well  as  others 
such  as  total  CPU  time  required  to  solve  the  test  set. 

Many  of  the  small  test  problems  have  multiple  local  minimizers.  As  shown  in  Tables  9- 
16,  different  methods  (NPSOL,  MINOS  or  LSSQP)  may  converge  to  different  minimizers. 

Fourteen  of  the  small  test  problems  contain  only  linear  constraints:  problems  10,  12- 
14,  24,  43-44,  47-49,  61,  76-77  and  80.  On  such  problems  MINOS  requires  only  a  single 
major  iteration.  This  is  not  true  for  the  SQP  methods.  The  differences  between  the 
approaches  arise  from  the  fact  that  the  reduced  Hessian  approximation  is  updated  each 
minor  iteration  for  MINOS  and  each  major  iteration  for  SQP  methods.  Thus,  MINOS 
generally  has  more  opportunities  to  perform  the  update.  As  a  result,  we  would  expect  the 
performance  of  NPSOL  and  LSSQP  to  be  similar  on  these  linearly  constrained  problems 
(but  to  differ  from  that  of  MINOS). 

5.2.2.  Run-time  parameters:  Small  test  set 

In  MINOS,  NPSOL  and  LSSQP  the  SPECS  or  options  file  sets  various  run-time  parameters 
that  describe  the  nature  of  the  problem  to  be  solved  and  the  quality  of  the  solution  to  be 
obtained.  The  options  file  must  begin  with  the  keyword  “BEG in” .  Each  subsequent  line  of 
the  options  file  contains  one  or  more  keywords  and  an  associated  value.  For  example,  the 
line 


Nonlinear  Constraints  14 

specifies  that  the  problem  has  14  nonlinear  constraints.  The  last  line  of  the  options  file  is 
signified  with  the  keyword  “end”.  A  full  description  of  the  MINOS  and  NPSOL  options  can 
be  found  in  [MurS87]  and  [GMSW86a]  respectively.  The  LSSQP  solver  maintains  the  use 
of  all  MINOS  options  as  well  as  a  few  others,  such  as  whether  or  not  to  use  self-scaling  for 
the  BFGS  update  to  the  reduced  Hessian.  Except  where  noted  in  the  Comment  section 
of  Tables  7-8,  a  uniform  set  of  options  was  used  for  all  runs.  An  example  of  the  MINOS 
options  file  (for  problem  number  1)  is  given  in  Figure  1. 

Problems  1-3  used  the  “Jacobian  Sparse”  option.  Problems  4-80  used  the  “Jacobian  Dense” 
option.  Problems  with  more  than  10  variables  used  a  SPECS  file  that  increased  the  major 
iterations  limit  to  300  and  the  total  minor  iterations  limit  to  1000.  All  other  parameters 
were  set  to  their  default  options.  For  the  runs  with  NPSOL  the  default  options  were  used 
(  see  [Pri89]). 

5.2.3.  The  large  test  problems 

The  large  test  problems  come  from  a  class  of  applications  known  as  trajectory  optimization. 
All  were  generated  using  the  system  OTIS  [HarP87],  and  their  specifications  are  given  in 
Table  1.  A  description  of  these  sparse  optimal  control  problems  and  their  mathematical 
programming  formulation  is  given  in  the  Appendix.  It  is  important  to  note  that  the  OTIS 
function  routines  compute  first  derivatives  by  finite  differences.  Advantage  is  taken  of  the 
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BEGII  Hexagon  (Sparse  version) 

Problem  Number  1 

Nonlinear  Constraints  14 
Nonlinear  Variables  9 

Iterations  300 

Major  Iterations  50 

Print  Level  0 

Jacobian  Sparse 

END  Hexagon 


Figure  1.  Sample  SPECS  file  for  small  test  problems 

sparsity  pattern  of  the  Jacobian,  but  this  is  partly  why  the  function/gradient  evaluations 
are  expensive.  Also,  the  truncation  error  in  the  gradients  makes  it  difficult  to  confirm 
“optimality”  unless  <50pt  is  raised  to  about  10-4. 


No. 

Problem 

Comment 

n 

Leon. 

Neon. 

r  F(x*) 

1 

F4  Min-time  climb 

6  Nodes 

112 

8 

126 

1.0280052864e+00 

2 

F4  Min-time  climb 

7  Nodes 

130 

8 

150 

1.003055 146  le+00 

3 

F4  Min-time  climb 

8  Nodes 

148 

8 

174 

9.9471482986e— 01 

4 

F4  Min-time  climb 

9  Nodes 

166 

8 

198 

9.901 1108551e-01 

5 

F4  Min-time  climb 

10  Nodes 

184 

8 

222 

9. 87586471 60e— 01 

6 

F4  Min-time  climb 

11  Nodes 

202 

8 

246 

9.853 1 9694 86e— 01 

7 

F4  Min-time  climb 

12  Nodes 

220 

8 

270 

9.8519559525e— 01 

8 

F4  Min-time  climb 

14  Nodes 

256 

8 

318 

9.8490191 159e-01 

9 

F4  Min-time  climb 

15  Nodes 

274 

8 

342 

9. 83472451 92e— 01 

10 

F4  Min-time  climb  NP 

6  Nodes 

112 

8 

104 

1.0018209554e+00 

11 

F4  Min-time  climb  NP 

15  Nodes 

274 

8 

284 

9.8165983233e— 01 

12 

VTOL  Descent 

435 

24 

483 

—  1  -6825846298e+00 

Table  1.  Large  problem  statistics. 


5.2.4.  Minimum  time-to-climb  problems 

The  first  9  large  test  problems  are  for  a  minimum  time-to-climb  problem  [Bry69]  for  an 
F4  Phantom  II  supersonic  interceptor.  The  aim  is  to  find  the  pitch  function  to  take  the 
aircraft  from  sea  level  and  Mach  0.34  to  an  altitude  of  20  km  (as  65,617  ft)  and  Mach  1.0 
in  minimum  time.  The  problems  differ  in  the  number  of  distinct  time  segments  or  nodes 
used  to  define  the  problem.  The  number  of  nodes  varies  from  6  to  15.  In  general,  as  the 
number  of  nodes  increases,  the  model  becomes  more  accurate  and  the  optimal  objective 
decreases  but  the  problem  becomes  more  difficult  to  solve. 

Problems  10-11  correspond  to  the  F4  minimum  time-to-climb  problem  in  which  the 
pressure  constraint  has  been  omitted  from  the  problem  formulation.  Although  the  number 
of  constraints  is  fewer,  the  problem  appears  to  be  more  difficult  to  solve. 
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Problem  12  is  the  the  VTOL  Descent  problem,  which  finds  the  optimal  descent  trajec¬ 
tory  for  a  vertical  take-off  and  landing  aircraft. 


5.2.5.  Run-time  parameters:  Large  test  set 

Most  of  the  standard  MINOS  run-time  options  were  used.  Special  options  for  the  large 
problems  were  contained  in  a  single  options  file  used  for  all  large  runs  using  either  MINOS, 
NPSOL  or  LSSQP  as  the  solver.  The  MINOS  options  file  for  these  runs  is  given  in  Figure  2. 


BEGIN  OTIS  (Trajectory  Problems) 


Major  Iterations  500 
Minor  Iterations  200 
Iterations  10000 

Linesearch  Tolerance  0.5 

Roe  Tolerance  1 . 0E-05 
Function  Precision  1.0E-10 
Optimality  Tolerance  1.0E-04 
Feasibility  Tolerance  1.0E-05 

Verify  Level  -1 
Print  Level  0 
Hessian  Dimension  50 

Partial  Price  1 
Crash  Option  1 

Jacobian  Sparse 
Solution  No 


END  OTIS 


Figure  2.  SPECS  file  for  large  test  problems 

When  LSSQP  was  run  with  early  termination  of  the  subproblems,  the  parameters 

Optlevel  Partial 
Multiple  Price  5 

were  added  to  the  MINOS  options  file.  These  options  were  not  included  in  the  MINOS 
runs 

on  the  small  test  set  or  used  in  the  NPSOL  runs.  The  statement  “Optlevel  Partial”  invokes 
the  early-termination  strategy  and  halts  the  solution  of  the  subproblem  after  finding  the 
first  stationary  point  (i.e.  when  \\ZTg<2>\\  <  £0pt)-  Invoking  “Multiple  Price  5”  allows  more 
than  one  nonbasic  variable  to  be  deleted  from  the  working  set  during  a  single  minor 
iteration.  It  is  hoped  that  this  will  prevent  the  prototype  algorithm  from  expending  too 
many  minor  iterations  on  a  nonoptimal  working  set. 


5.3  Numerical  results 


59 


5.3.  Numerical  results 

In  this  section  we  compare  the  results  from  LSSQP  with  NPSOL  [GMSW86a]  (version 
4.05)  and  MINOS  [MurS87]  (version  5.3).  The  purpose  of  the  tests  is  to  demonstrate 
the  efficiency  of  the  new  algorithm  for  large  sparse  nonlinear  optimization  and  show  the 
following: 

•  The  method  is  a  practical  alternative  to  the  dense  SQP  method  of  NPSOL  for  large 
sparse  problems. 

•  The  method  is  a  practical  alternative  to  the  Lagrangian  method  of  the  MINOS  system 
for  problems  with  functions  and  gradients  that  are  expensive  to  compute. 

Numerical  result  for  the  small  test  set  are  summarized  in  Table  2.  Complete  results  for 
all  test  problems  are  given  in  Tables  3-16.  The  descriptive  headings  for  the  columns  in  the 
tables  correspond  to  problem  number  and  name,  major  iteration  count,  minor  iteration 
count,  total  function  evaluations,  final  objective  value,  maximum  constraint  violation 
(MINOS  and  LSSQP  only),  and  solution  time  in  CPU  seconds.  The  final  status  of  the 
solver  is  given  in  the  last  column  of  the  tables.  The  notation  is  as  follows: 

Opt:  An  optimal  solution  was  found  (MINOS,  NPSOL,  LSSQP). 

Fail:  The  algorithm  failed  to  find  an  optimal  solution  (NPSOL). 

Itr:  The  solver  reached  the  limit  on  total  major  or  minor  iterations  for  the  problem  (MI¬ 
NOS,  LSSQP). 

Cbi:  The  final  point  did  not  satisfy  the  termination  conditions  but  could  not  be  improved 
upon  (MINOS,  LSSQP). 

Note  that  “function  evaluation”  means  computation  of  both  the  objective  and  constraint 
functions  and  their  gradients. 

5.3.1.  Results  for  the  small  test  set 

Four  runs  were  made  on  the  small  test  set.  One  run  each  was  made  using  MINOS  (Tables  9 
and  10)  and  NPSOL  (Tables  11  and  12),  and  two  runs  were  performed  using  the  prototype 
algorithm  LSSQP.  The  LSSQP  runs  differed  only  in  whether  the  subproblems  were  solved 
to  completion  (Tables  13  and  14)  or  the  early-termination  strategy  was  used  (Tables  15 
and  16). 

A  comparison  of  the  runs  using  LSSQP,  NPSOL  and  MINOS  on  the  small  test  set  is 
summarized  in  Table  2.  The  column  headings  entitled  MINOS  and  NPSOL  are  self- 
explanatory.  LSSQP-0  gives  results  for  LSSQP  in  which  QP  subproblems  were  solved 
to  optimality  and  LSSQP-E  gives  results  for  LSSQP  when  subproblems  were  terminated 
early. 

Each  of  the  three  methods  were  similarly  robust.  The  first  three  terminated  success¬ 
fully  on  76  of  the  80  problems.  For  all  solved  problems,  NPSOL  required  the  fewest  function 
evaluations  ( 1642  and  797  CPU  seconds).  MINOS  required  significantly  more  function  eval¬ 
uations  (5876)  but  the  least  time  (560  CPU  seconds).  LSSQP-0  required  2658  function 
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Algorithm 

wmm'i 

NPSOL 

LSSQP-0 

LSSQP-E 

Problems  attempted 

80 

80 

80 

80 

No.  Optimal 

76 

76 

76 

67 

Total  major  iterations 

646 

1142 

1493 

1339 

Total  minor  iterations 

2477 

2067 

2802 

2086 

Total  function  evals. 

5876 

1642 

2658 

2489 

Total  Time 

559.96 

797.12 

753.97 

944.42 

Table  2.  Summary  of  small  test  problem  results. 


evaluations  and  754  CPU  seconds,  while  LSSQP-E  (with  the  early-termination  strategy) 
solved  only  67  of  the  test  problems  and  required  2489  function  evaluations  and  944  CPU 
seconds. 

Note  that  two  of  our  hypotheses  of  Section  5.1  are  borne  out  by  the  results  on  the 
small  test  problems.  MINOS  required  the  fewest  major  iterations  by  about  2:1  over  NPSOL 
and  LSSQP.  In  addition,  MINOS  required  more  minor  iterations  per  major  iteration,  while 
the  total  number  of  minor  iterations  was  similar  for  all  methods. 

While  the  number  of  function  evaluations  is  a  salient  measure  of  the  efficiency  of  the 
methods  tested,  computational  efficiency  may  also  be  measured  by  total  solution  time. 
NPSOL  provided  the  fastest  solution  time  for  48  out  of  the  80  problems,  while  MINOS 
proved  to  be  fastest  for  21  of  the  80.  The  two  LSSQP  tests  were  fastest  on  only  11 
problems.  Note  that  the  overall  time  for  the  80  problems  is  somewhat  misleading  since 
two  of  the  problems,  namely  numbers  11  and  80  (OPF  30  BUS  and  lVeaj)on),  required  a 
disproportionate  amount  of  solution  time  for  all  four  methods.  Deleting  these  problems 
gives  the  overall  timing  results  as:  NPSOL:  208  seconds,  MINOS:  336  seconds,  LSSQP-O: 
399  seconds.  LSSQP-E  required  365  seconds  to  solve  65  of  the  80  problems  to  optimality. 

On  the  14  problems  with  only  linear  constraints,  MINOS  required  1334  function  evalu¬ 
ations  and  133  CPU  seconds,  while  NPSOL  required  248  functions  and  171  CPU  seconds. 
LSSQP-0  (LSSQP-E)  required  392  (411)  functions  and  233  (223)  seconds.  MINOS  required 
the  fewest  CPU  seconds  to  find  a  minimizer  for  8,  NPSOL  for  4  and  LSSQP  for  2  of  the  14 
problems. 

As  yet,  the  early-termination  option  is  not  as  robust  as  solving  the  subproblems  to 
optimality.  This  could  be  due  to  the  fact  that  only  the  stationary  point  from  the  QP 
subproblem  is  used  for  the  merit  function  (i.e.  the  method  does  not  use  an  auxiliary 
search  direction  as  does  Prieto  [Pri89],  who  provided  more  encouraging  results  using  this 
strategy  in  a  modification  of  NPSOL). 

5.3.2.  Results  for  the  large  test  set 

The  large  set  (problems  81-92)  were  solved  using  MINOS.  NZSOL  and  LSSQP.  NZSOL 
is  a  version  of  NPSOL  in  which  the  QP  subproblems  are  solved  using  QPSOL  [GMSW83] 
instead  of  LSSOL  [GHMSW86].  Like  NPSOL,  NZSOL  is  a  dense  SQP  method,  but  has  been 
modified  to  maintain  a  factorization  of  the  reduced  Hessian  ZTH  Z  instead  of  the  full  (and 
dense)  transformation  QTHQ.  Hence,  NZSOL  is  expected  to  outperform  NPSOL  on  large 
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No. 

Problem 

|  Itns. 

Funs. 

F(x') 

Time 

Stat. 

81 

F4  6  Node 

WSM 

540 

1.0280054062e+00 

644.98 

Opt 

82 

F4  7  Node 

■9 

637 

1245 

1.0099320805e+00 

1769.14 

Opt 

83 

F4  8  Node 

Bl 

792 

1511 

9. 9609431 692e— 01 

2490.31 

Opt 

84 

F4  9  Node 

ml 

3046 

9.901 1118247e— 01 

5660.29 

Opt 

Table  3.  Large  Problem  Results:  MINOS  version  5.3. 


No. 

Problem 

Itns. 

QP  It. 

Funs. 

Fix') 

Time 

Stat. 

81 

F4  6  Node 

18 

64 

31 

1.029382e+00 

175.11 

Opt 

82 

F4  7  Node 

19 

58 

29 

1.003055e+00 

263.37 

Opt 

83 

F4  8  Node 

21 

94 

39 

9.960943e— 01 

431.41 

Opt 

84 

F4  9  Node 

19 

95 

32 

9.901111e— 01 

530.23 

Opt 

85 

F4  10  Node 

21 

127 

37 

9.875875e— 01 

844.71 

Opt 

86 

F4  11  Node 

27 

80 

33 

9.8631 1  le— 01 

1264.22 

Opt 

87 

F4  12  Node 

28 

85 

34 

9.851976e— 01 

1748.94 

Opt 

88 

F4  14  Node 

25 

78 

30 

9.84901  le-01 

2383.98 

Opt 

89 

F4  15  Node 

26 

108 

31 

9.834741e— 01 

3154.55 

Opt 

90 

F4-NP  6  Node 

22 

HHI 

30 

1.001819e+00 

181.78 

Opt 

91 

F4-NP  15  Node 

33 

Hi 

38 

9.695897e— 01 

3858.89 

Opt 

92 

VTOL  Descent 

20 

159 

25 

—  1.683044e— 00 

10351.63 

Opt 

Table  4.  Large  Problem  Results:  NZSOL 


problems  with  few  degrees  of  freedom. 

Results  for  MINOS  and  NZSOL  are  given  in  Tables  3  and  4.  Results  for  LSSQP-O  and 
LSSQP-E  are  given  in  Tables  5  and  6. 

The  results  of  the  tests  on  these  larger  problems  show  that  LSSQP-0  is  very  competi¬ 
tive  with  MINOS  and  NZSOL.  NZSOL  recpiired  the  fewest  function  evaluations  and  major 
and  minor  iterations  than  either  MINOS  or  LSSQP-0  on  all  the  test  problems.  LSSQP-0 
required  many  fewer  function  evaluations  and  exhibited  faster  solution  times  than  MINOS. 
In  addition,  LSSQP-0  was  very  competitive  with  NZSOL  with  respect  to  solution  time. 

One  of  the  major  differences  between  LSSQP-0  and  NZSOL  is  in  the  number  of  funct  ion 
evaluations  and  major  and  minor  iterations  required  to  solve  the  problems  to  the  specified 
accuracy.  For  example,  LSSQP  required  three  to  thirty  times  as  many  functions  as  NZSOL. 
One  reason  for  this  could  be  the  differences  in  the  form  of  the  quasi-Newton  update. 
Another  reason  could  be  the  form  of  the  null-space  basis  Z. 

It  should  be  noted  that  extensive  tests  using  LSSQP  on  the  large  test  problems  indicate 
that  it  does  not  have  the  same  level  of  robustness  offered  by  NZSOL.  Modification  of  one 
or  more  of  the  run-time  parameters  may  lead  to  significantly  slower  solution  times.  One 
method  for  increasing  the  robustness  of  LSSQP  would  be  to  modify  the  linesearch  routines 
to  account  for  the  lack  of  precision  in  the  gradients  of  these  problems.  The  tests  reported 
in  this  section  used  a  'function  plus  gradient'  linesearch  even  though  the  gradients  for  these 
large  problems  are  obtained  by  differencing.  Preliminary  tests  have  shown  that  a  ‘function 
only’  linesearch  may  produce  a  more  robust  (and  efficient)  version  of  LSSQP.  More  tests 
are  required  to  determine  the  exact  cause  of  the  large  differences  in  performance  between 
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No. 

Problem 

Itns. 

QP  It. 

Funs. 

F(x ’) 

Time 

Stat. 

81 

F4  6  Node 

40 

373 

82 

1.0280052864e+00 

104.27 

Opt 

82 

F4  7  Node 

42 

455 

97 

1. 0030551 46  le+00 

151.83 

Opt 

83 

F4  8  Node 

36 

382 

70 

9.9471482986e— 01 

126.03 

Opt 

84 

F4  9  Node 

40 

394 

91 

9.901 1108551e— 01 

160.97 

Opt 

85 

F4  10  Node 

76 

1139 

221 

9.8758697042e— 01 

506.04 

Opt 

86 

F4  11  Node 

85 

1249 

254 

9.8531899149e— 01 

645.75 

Opt 

87 

F4  12  Node 

150 

2770 

532 

9.8519559525e— 01 

1536.22 

Opt 

88 

F4  14  Node 

182 

2831 

794 

9.8490 1 91 159e— 01 

2563.11 

Opt 

89 

F4  15  Node 

246 

5067 

996 

9.83472451 92e— 01 

3901.23 

Opt 

90 

F4-NP  6  Node 

63 

255 

212 

1.0018209554e+00 

232.15 

Opt 

91 

F4-NP  15  Node 

281 

4906 

1196 

9.8165983233e— 01 

4793.97 

Opt 

92 

VTOL  Descent 

105 

1857 

320 

—  1 ,6825846298e— 00 

2363.40 

Opt 

Table  5.  Large  Problem  Results:  (LSSQP-O)  Full  completion. 


No. 

Problem 

Itns. 

QP  It. 

Funs. 

F(z’) 

Time 

Stat.  |! 

81 

F4  6  Node 

64 

279 

144 

1.0280054023e+00 

187.85 

MtfXM 

82 

F4  7  Node 

44 

204 

94 

1 .003055 1 798e+00 

143.22 

|E5fl 

83 

F4  8  Node 

143 

452 

143 

9.947 1483247e— 01 

544.59 

mm 

1  84 

F4 9  Node 

85 

393 

235 

9.9011120191e— 01 

455.78 

»J9S 

85 

F4  10  Node 

143 

888 

494 

9.87587601 21e— 01 

1063.13 

R  91 

86 

F4  11  Node 

102 

625 

325 

9.8531969486e— 01 

786.71 

W*  91 

87 

F4  12  Node 

263 

1418 

804 

9. 85 1 963891 4e— 01 

2166.77 

El 1 

Table  6.  Large  Problem  Results:  (LSSQP-E)  Early  termination. 


NZSOL  and  LSSQP. 

For  the  largest  problem  in  the  test  set,  the  V'TOL  Descent  problem,  the  value  of  sparse- 
matrix  operations  becomes  very  clear.  Even  though  NZSOL  requires  a  twelfth  of  the  (very 
expensive)  function  evalutions,  the  total  CPU  time  is  more  than  four  times  that  of  LSSQP. 

The  results  from  MINOS  show  that  the  method  requires  many  function  evaluations, 
which  results  in  a  substantial  increase  in  the  solution  times  compared  to  NZSOL  and 
LSSQP-O.  As  the  problem  size  increases,  the  solution  times  grow  rapidly.  For  this  rea¬ 
son  MINOS  was  tested  on  only  the  four  smallest  F4  Minimum  time-to-climb  problems 
(problems  81-84). 

The  performance  of  MINOS  on  these  problems  is  somewhat  anomalous.  We  see  that 
the  hypothesis  that  MINOS  takes  fewer  majci  iterations  still  holds,  but  the  number  of 
minor  iterations  per  major  iteration  relative  to  those  required  by  NZSOL  and  LSSQP  is 
significantly  more  than  for  the  small  test  problems.  The  computation  of  the  constraints 
and  gradients  for  these  problems  are  very  expensive  (see  the  Appendix).  Since  MINOS 
must  evaluate  the  constraints  and  gradients  many  times  to  solve  each  subproblem  it  can¬ 
not  match  the  SQP  methods,  which  only  evaluate  functions  and  gradients  after  each 
subproblem  (requiring  fewer  total  function  evaluations). 

As  with  the  small  test  set,  the  early-termination  strategy  did  not  perform  as  well  as 
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the  full-optimization  strategy.  LSSQP-E  was  tested  on  the  first  6  problems  in  the  large  test 
set  and  required  60%  more  CPU  than  LSSQP-O.  It  also  required  more  major  iterations 
and  function  evaluations  but  fewer  minor  iterations  than  LSSQP-O.  LSSQP-E  required 
more  solution  time  on  all  but  the  7-node  problem  over  that  required  by  LSSQP-O.  It  is 
important  to  note  that  the  QP  iterations  on  these  optimal  control  problems  are  relatively 
cheap  compared  to  function  and  gradient  evalutions.  Since  the  early-termination  strategy 
is  designed  to  reduce  the  number  of  QP  iterations  required,  it  is  not  expected  to  impact 
the  solution  of  problems  such  as  these.  Still,  it  is  encouraging  that  the  early-termination 
algorithm  (even  with  its  acknowledged  deficiencies)  proved  to  be  robust  on  the  set  of 
large  test  problems.  Moreover,  it  did  achieve  a  reduction  in  the  number  of  QP  iterations 
required  to  find  a  minimizer.  More  testing  is  needed  on  large-scale  problems  for  which  the 
QP  iterations  are  similar  in  cost  to  function  evaluations. 

5.3.3.  A  final  note  on  computational  results 

There  are  several  criteria  that  should  be  used  to  measure  the  efficiency  of  an  algorithm. 
Two  important  measures  are  speed  (measured  in  CPU  time  required  for  solution)  and  the 
storage  required  by  all  data  structures  used  by  the  algorithm.  LSSQP  is  similar  to  MINOS 
in  this  last  respect,  its  data  structures  being  almost  identical.  Because  of  their  sparse- 
matrix  technology,  both  of  these  methods  have  an  advantage  over  NPSOL  (or  NZSOL)  on 
large  problems  whose  Jacobian  is  sparse. 

Sparse-matrix  technology  does  not  give  an  advantage  to  either  MINOS  or  LSSQP  over 
NPSOL  for  problems  in  the  small  test  set.  However,  because  the  function  and  gradient 
evaluations  are  relatively  cheap  for  these  problems,  the  CPU  time  is  highly  correlated  with 
the  number  of  basis  factorizations  required  to  find  a  minimizer  (at  least  for  the  larger  of 
the  small  test  problems). 

For  the  t  ra  jectory  optimization  problems,  if  sparse-matrix  methods  are  employed  the 
computational  cost  of  solving  the  problem  is  dominated  by  the  cost  of  function  and  gradi¬ 
ent  evaluations.  For  these  problems  the  salient  measure  of  efficiency  is  time.  For  MINOS 
and  the  two  variants  of  LSSQP,  the  solution  time  is  highly  correlated  with  the  number 
of  function  evalutions  required.  As  the  problems  grow  in  size,  the  cost  of  the  dense  TQ 
factorization  required  within  NZSOL  increasingly  impacts  the  solution  time.  As  a  result, 
LSSQP  exhibits  a  growing  advantage  in  time  over  NZSOL.  This  is  clearly  shown  by  the 
results.  The  worst  relative  performance  for  NZSOL  is  for  the  largest  problem. 

5.4.  Conclusions 

We  have  proposed  a  new  algorithm  for  the  solution  of  large-scale  nonlinear  programming 
problems.  Our  approach  differs  substantially  from  previous  methods  because  the  QP 
subproblems  are  solved  using  sparse  techniques  and  we  approximate  only  the  reduced 
Hessian  of  the  Lagrangian.  The  theoretical  convergence  properties  of  the  new  method  are 
the  same  as  for  dense  implementations  that  approximate  only  the  reduced  Hessian  of  the 
Lagrangian.  Based  on  the  preliminary  numerical  test  results  for  the  algorithm,  there  is 
every  reason  to  expect  that  the  algorit  hm  will  prove  useful  in  practice  for  many  large-scale 
problems  in  which  the  nonlinear  function  and  constraints  are  computationally  expensive 
to  evaluate. 
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5.4.1.  Future  work 

Many  modifications  could  be  made  to  the  prototype  SQP  algorithm.  Some  possibilities 
follow: 

1.  Use  of  a  '‘limited  size ”  quasi-Newton  approximation  to  the  reduced  Hessian.  In 
this  strategy,  an  approximation  to  a  pseudo  reduced  Hessian  of  a  fixed  size  (e.g.  of 
dimension  200)  would  be  updated  at  each  iteration.  This  would  include  the  reduced 
Hessian  for  the  current  superbasic  variables  as  well  as  for  other  pseudo  superbasic 
variables.  The  latter  would  be  a  subset  of  the  remaining  nonbasic  variables  and 
could  be  identified  as  those  variables  that  would  be  expected  to  be  superbasic  or 
had  “recently”  been  superbasic  (but  were  not  currently).  This  would  allow  curvature 
information  associated  with  these  variables  to  be  maintained  throughout  the  solution 
process  and  may  lead  to  faster  convergence.  An  observed  feature  of  LSSQP  and 
NZSOL  on  large  problems  is  that  the  number  of  QP  iterations  does  not  always 
decrease  to  1  in  a  neighborhood  of  the  solution  but  is  usually  some  small  number. 
Variables  on  a  bound  with  small  reduced  costs  may  enter  and  leave  the  superbasic 
set.  The  proposed  modification  would  prevent  the  curvature  for  these  variable  from 
being  lost. 

2.  Use  of  second  derivatives  in  the  solution  of  subproblems.  In  general,  we  would  expect 
to  obtain  faster  rates  of  convergence  (i.e.  quadratic  versus  two-step  superlinear)  at 
the  expense  of  a  more  complicated  algorithm.  With  exact  second  derivatives  it  would 
be  necessary  to  have  a  more  complex  linesearch  as  well  as  a  method  for  maintaining 
a  positive-definite  reduced  Hessian  and  routines  for  obtaining  directions  of  negative 
curvature  (see  [Pri89]  for  example). 

3.  Use  of  single-phase  subproblems.  Such  a  method  would  incorporate  a  merit  function 
within  the  subproblem  itself,  and  should  perform  well  on  large-scale  problems  that 
require  a  large  number  of  minor  iterations  in  order  to  obtain  a  feasible  point  for  the 
subproblem.  Such  a  modification  would  also  allow  for  infeasible  subproblems. 

4.  Use  of  the  present  algorithm  at  a  lower  level.  Part  of  the  large-scale  SQP  algorithm 
in  this  report  is  an  algorithm  to  solve  a  QP  based  on  the  provision  of  a  reduced 
Hessian.  Such  an  algorithm  could  be  used  to  solve  the  subproblems  in  MINOS.  It 
would  have  three  levels  of  iteration.  At  the  lowest  level,  function  and  gradients  would 
not  be  required.  At  the  intermediate  level,  subproblems  would  be  solved  with  the  use 
of  an  augmented  Lagrangian  objective  function  (as  is  done  now  with  MINOS).  The 
top  level  corresponds  to  a  major  iteration  and  makes  use  of  a  merit  function  and  a 
linesearch.  We  expect  that  such  an  algorithm  would  improve  upon  the  performance 
of  MINOS  on  problems  for  which  the  functions  were  expensive.  It  may  be  expected 
to  do  better  than  the  algorithm  described  here  on  problems  for  which  the  function 
evaluations,  although  expensive,  did  not  overwhelm  the  total  computational  effort. 
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No. 

Problem 

Comment 

n 

Leon. 

Neon. 

F{  xw) 

1 

Hexagon 

Sparse  version 

9 

4 

14 

— 1.349963e+00 

2 

HS  108 

Sparse  version 

9 

0 

13 

— 8.660254e— 01 

3 

KS  372 

Sparse  version 

9 

SH 

12 

1.339009e+04 

4 

MHW  4 

Margaret  Wright 

5 

Ha 

3 

2.787187e+01 

5 

MHW  9 

Margaret  Wright 

5 

IH 

3 

— 4.247468e+01 

6 

HS  14 

2 

i 

1 

1.393465e+00 

7 

HS  26 

«op.  =  10"1 

3 

0 

1 

0.000000e+00 

8 

HS  43 

Rosen-Suzuki 

4 

0 

3 

-4.400000e+01 

9 

HS  65 

3 

0 

1 

9.535289e— 01 

10 

KS  231 

2 

2 

0 

0.000000e+00 

11 

OPF  30  BUS 

67 

0 

60 

9.720420e— 01 

12 

QP  problem 

7 

7 

0 

—  1.847785e+06 

13 

LC7 

7 

7 

0 

9.295973e+05 

14 

Norway 

7 

6 

0 

-2.402344e+01 

15 

Singular 

2 

0 

2 

0.000000e+00 

16 

Alan  Manne 

Economic  growth 

30 

10 

— 2.670099e+00 

17 

Steinke2 

fop,  =  10"1 

6 

0 

4.000131e— 04 

18 

Square  root  1 

9 

0 

2.500000e+03 

19 

Square  root  2 

9 

0 

2.999795e+00 

20 

Square  root  3 

9 

0 

2.000000e+00 

21 

Square  root  4 

4 

0 

2.500000e+01 

22 

Boggs-Tolle  1 

o 

0 

i 

-1.000000e+00 

23 

Boggs-Tolle  2 

3 

0 

i 

3.256820e— 02 

24 

Boggs-Tolle  3 

5 

3 

0 

4.093023e+00 

25 

Boggs-Tolle  4 

3 

1 

1 

— 4.551055e— 03 

26 

Boggs-Tolle  5 

HS  63 

3 

1 

1 

9.577426e+02 

27 

Boggs-Tolle  6 

HS  77 

5 

HKE1S 

2 

2.415051e— 01 

28 

Boggs-Tolle  7 

5 

0 

3 

3.065000e+02 

29 

Boggs-Tolle  8 

5 

2 

1.000000e+00 

30 

Boggs-Tolle  9 

HS  39 

4 

0 

2 

-1.000000e+00 

31 

Boggs-Tolle  10 

2 

0 

2 

—  1.000000e+00 

32 

Boggs-Tolle  11 

HS  79 

5 

0 

3 

9.171343e— 02 

33 

Boggs-Tolle  12 

5 

Hniii 

3 

6.188119e+00 

34 

Powell  triangles 

7 

0 

5 

2.331371e+01 

35 

Powell  badly  scaled 

2 

0 

1 

3.586574e— 03 

36 

Powell  wriggle 

2 

0 

— 1.91 1618e — 16 

37 

Powell  wriggle 

a-o  =  (0.097,0.063) 

2 

0 

—  1.911618e— 16 

38 

Powell- Maratos 

2 

0 

1 

-1.000000e+00 

39 

HS  72 

6opl  =  10 

4 

0 

2 

7.266794e+02 

40 

HS  73 

Cattle  feed 

4 

2 

1 

2.989438e+01 

Table  7.  Small  problem  statistics  (1-40). 
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No. 

Problem 

Comment 

n 

Leon. 

Neon. 

F(z') 

41 

HS  107 

9 

0 

6 

5.05501  le+03 

42 

Mukai-Polak 

6 

0 

2 

5.000000e+00 

43 

Penalty  1 

a 

50 

1 

0 

4.313635e— 02 

44 

Penaltyl 

c 

50 

1 

0 

4.313635e— 02 

45 

HS  32 

3 

1 

1 

1.000000e+00 

46 

HS  46 

5 

2 

0.000000e+00 

47 

HS  51 

5 

0 

0.000000e+00 

48 

HS  52 

5 

0 

5.32664 8e+00 

49 

HS  53 

5 

0 

4.093023e+00 

50 

HS  13 

2 

0 

1 

1.000000e+00 

1  51 

HS  64 

3 

mm l 

1 

6.299842e+03 

52 

HS  70 

4 

1111 

1 

7.498464e— 03 

53 

HS  71 

4 

IBI 

2 

1.701402e+01 

54 

HS  74 

4 

2 

3 

5.126498e+03 

55 

HS  75 

4 

2 

3 

5.174413e+03 

56 

HS  78 

5 

0 

3 

-2.919700e+00 

57 

HS  80 

5 

0 

3 

5.394985e— 02 

58 

HS  81 

5 

3 

5.394985e— 02 

59 

HS  84 

5 

0 

3 

-5.329025e+06 

60 

HS  85 

^opt  =10 

5 

0 

38 

— 1.9051 34e+00 

61 

HS  86 

Colville  No.  1 

5 

10 

0 

-3.234868e+01 

62 

HS  93 

Transformer  design 

6 

HI 

2 

1.350760e+02 

63 

HS  95 

6 

!§S1 

4 

1.561953e— 02 

64 

HS  96 

6 

HI 

4 

1.561953e— 02 

65 

HS  97 

6 

0 

4 

3.135809e+00 

66 

HS  98 

6 

0 

4 

3.135809e+00 

67 

HS  99 

7 

2 

— 8.310799e+08 

68 

HS  100 

7 

0 

4 

6.806301e+02 

69 

HS  104 

Reactor  design 

8 

0 

5 

3.951 163e+00 

70 

HS  109 

9 

1 

8 

5.362069e+03 

71 

HS  111 

10 

3 

-4.776109e+01 

72 

HS  112 

Chemical  equilibrium 

10 

3 

0 

— 4.776109e+01 

73 

HS  113 

Wong  No.  2 

10 

3 

5 

2.430621e+01 

74 

HS  114 

Alkylation  process 

10 

5 

6 

-1.768807e+03 

75 

HS  117 

Colville  No.  2,  Shell  dual 

15 

0 

5 

3.234867e+03 

76 

HS  118 

LC  problem 

15 

17 

0 

6.648204e+02 

77 

HS  119 

Colville  No.  7 

16 

8 

0 

2.448997e+02 

78 

HS  83 

Dembo  No.  2 

5 

0 

6 

1.012243e+04 

79 

HS  106 

Dembo  No.  5 

8 

3 

3 

7.049331e+04 

80 

Weapon  assignment 

100 

12 

0 

—  1.735019e+03 

Table  8.  Small  problem  statistics  (41-80). 
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No. 

Problem 

Itns. 

LC  It. 

Funs. 

W) 

viol. 

Time 

Slot. 

1 

Hexagon 

9 

■S3 

— 1.349963e+00 

1.03e— 13 

6.87 

Opt 

2 

HS  108 

9 

mm 

— 8.660254e— 01 

2.94e— 14 

4.90 

Opt 

3 

KS  372 

12 

40 

84 

1.339009e+04 

1.73e— 10 

5.13 

Opt 

4 

MHW  4 

6 

13 

31 

2.787187e+01 

7.93e— 12 

1.14 

Opt 

5 

MHW  9 

19 

37 

130 

— 4.247468e+01 

2.47e— 13 

3.95 

Opt 

6 

HS  14 

7 

3 

17 

1.393465e+00 

0.00e+00 

.55 

Opt 

7 

HS  26 

16 

76 

194 

1.808355e— 12 

3.96e— 06 

4.13 

Opt 

8 

HS  43 

15 

56 

117 

— 4.400000e+01 

2.22e— 16 

3.67 

Opt 

9 

HS  65 

7 

14 

39 

9.535289e— 01 

2.67e— 15 

.97 

Opt 

10 

KS  231 

1 

5 

13 

1.449526e— 25 

0.00e+00 

.27 

Opt 

11 

OPF  30  BUS 

8 

127 

197 

9.720420e— 01 

5.33e— 15 

185.17 

Opt 

12 

QP  problem 

1 

10 

18 

-1.847785e+06 

0.00e+00 

.57 

Opt 

13 

LC7 

1 

9 

12 

9.295973e+05 

0.00e+00 

.47 

Opt 

14 

Norway 

1 

6 

8 

— 4.086420e+00 

0.00e+00 

.39 

Opt1 

15 

Singular 

18 

1 

20 

0.000000e+00 

1.16e— 10 

1.13 

Opt 

16 

Alan  Manne 

4 

20 

36 

— 2.670099e+ 00 

2.31e— 10 

4.10 

Opt 

17 

Stein  Ice'-' 

— 

— 

— 

— 

— 

— 

Itr 

18 

Square  root  1 

5 

2 

9 

2.500000e+03 

9.71e— 16 

.75 

Opt 

19 

Square  root  2 

27 

0 

29 

3.000000e+00 

1.42e— 14 

3.54 

Opt 

20 

Square  root  3 

5 

4 

14 

2.000000e+00 

5.43e— 10 

.85 

Opt 

21 

Square  root  4 

21 

0 

23 

2.500000e+03 

2.78e— 17 

1.67 

Opt 

22 

Boggs- Tolle  1 

17 

21 

67 

-1.000000e+00 

4.12e— 12 

1.72 

Opt 

23 

Boggs  Tolle  2 

8 

20 

66 

3.256820e— 02 

4.00e— 15 

1.19 

Opt 

24 

Boggs-' Tolle  3 

1 

3 

10 

4.093023e+00 

0.00e+00 

.24 

Opt 

25 

Boggs-Tolle  4 

7 

6 

18 

-7.317428e+01 

1.78e— 15 

.71 

Opt 

26 

Boggs- Tolle  5 

— 

— 

— 

— 

— 

— 

Itr 

27 

Boggs- Tolle  6 

15 

49 

121 

2.415051e— 01 

4.44e— 16 

3.19 

Opt 

28 

Boggs-Tolle  7 

4 

3 

12 

3.603798e+02 

2.00e— 12 

.43 

Opt 

29 

Boggs-  Tolle  8 

12 

2 

16 

1.000000e+00 

2.38e— 07 

.87 

Opt 

30 

Boggs-Tolle  9 

11 

21 

54 

—  1.000000e+00 

1.39e— 13 

1.65 

Opt 

31 

Boggs  Tolle  10 

8 

1 

12 

-1.000000e+00 

2.78e— 17 

.59 

Opt 

32 

Boggs  Tolle  11 

9 

19 

52 

9. 17134  3e— 02 

2.03e— 11 

1.67 

Opt 

33 

Boggs-Tolle  12 

19 

75 

191 

6.188119e+00 

1.42e— 14 

4.92 

Opt 

34 

Powell  triangles 

11 

34 

82 

2.331371e+01 

2.22e— 16 

3.37 

Opt 

35 

Powell  oad  scale 

6 

4 

19 

1.146177e— 12 

1.17e— 12 

.54 

Opt 

36 

Powell  wriggle  SI 

5 

7 

25 

1.061979e+00 

0.00e+00 

.67 

Opt 

37 

Powell  wriggle  S2 

Klis 

— 

— 

— 

— 

— 

Itr 

38 

Powell- Maratos 

10 

36 

—  1.000000e+00 

0.00e+00 

.96 

Opt 

39 

HS  72 

1 

7 

7.266819e+02 

1.16e— 16 

.33 

Opt 

40 

HS  73 

1 _ L 

7 

18 

2.989438e+01 

4.44e— 16 

.71 

Opt 

1  Converged  to  a  different  minimizer. 


Table  9.  Small  problems:  MINOS  (1-40). 
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|  No.  Problem  | 

41 

HS  107 

42 

Mukai-Polak 

43 

Penaltyl  a 

44 

Penalty  1  c 

45 

HS  32 

46 

HS  46 

47 

HS  51 

48 

HS  52 

49 

HS  53 

50 

HS  13 

51 

HS  64 

52 

HS  70 

53 

HS  71 

54 

HS  74 

55 

HS  75 

56 

HS  78 

57 

HS  80 

58 

HS  81 

59 

HS  84 

60 

HS  85 

61 

HS  86 

62 

IIS  93 

63 

HS  95 

64 

HS  96 

65 

HS  97 

66 

HS  98 

67 

HS  99 

68 

HS  100 

69 

HS  104 

70 

HS  109 

71 

HS  111 

72 

HS  112 

73 

HS  113 

74 

HS  114 

75 

HS  117 

76 

HS  118 

77 

HS  119 

78 

HS  83 

79 

HS  106 

80 

Weapon 

Ittu.  LC  It. 


12 


/'(*’) 

viol. 

Time 

Stat. 

5.055012e+03 

4.51e— 17 

1.53 

Opt 

5.000000e+00 

0.00e+00 

6.44 

Opt 

4.313635e— 02 

0.00e+00 

43.92 

Opt 

4.313635e— 02 

0.00e+00 

43.64 

Opt 

1.000000e+00 

0.00e+00 

.69 

Opt 

9.862650e— 15 

7.57e— 07 

3.11 

Opt 

9.629650e— 34 

0.00e+00 

.21 

Opt 

6.000000e+00 

0.00e+00 

.25 

Opt' 

4.093023e+00 

0.00e+00 

.16 

Opt 

1.000069e+00 

0.00e+00 

1.88 

Opt 

6.299842e+03 

2.54e— 08 

2.21 

Opt 

7.498464e— 03 

0.00e+00 

5.40 

Opt 

1.701402e+01 

8.88e— 16 

1.65 

Opt 

5.126498e+03 

1.42e— 14 

2.65 

Opt 

5.174413e+03 

4.83e— 13 

1.80 

Opt 

-2.919700e+00 

2.08e— 14 

1.07 

Opt 

5.394985e— 02 

2.35e— 12 

1.66 

Opt 

5.394985e— 02 

3.19e— 11 

1.73 

Opt 

— 5.191258e+06 

0.00e+00 

2.12 

Opt1 

—  1.905155e+00 

6.91e— 11 

4.85 

Opt 

— 3.234868e+01 

0.00e+00 

.74 

Opt 

1.350760e+02 

2.07e— 14 

2.44 

Opt 

1.561953e— 02 

0.00e+00 

.32 

Opt 

1.561953e— 02 

0.00e+00 

.24 

Opt 

4.071246e+00 

0.00e+00 

.92 

Opt' 

4.071246e+00 

0.00e+00 

.56 

Opt 

-8.310799e+08 

1.02e— 10 

8.78 

Opt 

6.839810e+02 

1.13e— 11 

4.27 

Opt 

Itr 

5.362069e+03 

1.39e— 13 

7.18 

Opt 

-4.776109e+01 

9.00e— 14 

14.31 

Opt 

-4.776109e+01 

0.00e+00 

2.26 

Opt 

2.430621e+01 

8.88e— 16 

10.64 

Opt 

—  1.768807e+03 

5.16e— 09 

6.04 

Opt 

3.234868e+01 

0.00e+00 

7.11 

Opt 

6.648204e+02 

0.00e+00 

1.20 

Opt 

2.448997e+02 

0.00e+00 

2.46 

Opt 

1.012243e+04 

6.89e— 13 

.67 

Opt 

2.100000e+03 

0.00e+00 

4.58 

Opt' 

-1.735019e+03 

0.00e+00 

38.33 

Opt 

Converged  to  a  different  niinimizer. 


Table  10.  Small  problems:  MINOS  (41-80). 
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No. 

Problem  j 

1 

Hexagon 

2 

HS  108 

3 

KS  372 

4 

MHW  4 

5 

MHW  9 

6 

HS  14 

7 

HS  26 

8 

HS  43 

9 

HS  65 

10 

I\S  231 

11 

OPF  30  BUS 

12 

QP  problem 

13 

LC7 

14 

Norway 

15 

Singular 

16 

Alan  Manne 

17 

Steinke2 

18 

Square  root  1 

19 

Square  root  2 

20 

Square  root  3 

21 

Square  root  4 

22 

Boggs-Tolle  1 

23 

Boggs- Tolle  2 

24 

Boggs-'l'olle  3 

25 

Boggs-Tolle  4 

26 

Boggs-Tolle  5 

27 

Boggs-Tolle  6 

28 

Boggs-Tolle  7 

29 

Boggs- Tolle  8 

30 

Boggs-Tolle  9 

31 

Boggs-Tolle  10 

32 

Boggs- Tolle  11 

33 

Boggs-Tolle  12 

34 

Powell  triangles 

35 

Powell  bad  scale 

36 

Powell  wriggle  SI 

37 

Powell  wriggle  S2 

38 

Powell- Maratos 

39 

HS  72 

40 

HS  73 

Itm.  |  QP  It. 


mm 


.1349963e+01 
.8660254e+00 
.1339009e+04 
.2787187e+02 
.3618808e+02 
,1393465e+00 
.1969433e— 20 
,4400000e+02 
,9535289e+00 
,1339909c— 20 
,9927005e+00 
,1847785e+07 
,9295973e+06 
,2402344e+02 
.0000000e+00 
,2670099e+01 


,2999795e+01 

2000000e+01 

.1000000e+01 
,3256820e— 01 
,4093023e+01 
,4551055e— 03 
,9577426e+03 
.2415051e+00 
,3065000e+03 
.1000000e+01 
.1000000e+01 
.1000000e+01 
,9171343e— 01 
,6188119e+01 
,2331371e+02 
,1305195e— 23 
,1911618e— 15 
.  25306 12e- 10 
.1000000e+01 
,7266794e+03 
.2989438C+02 


10.36 

1.31 

3.71 

.49 

3.39 

.81 

.70 

1.41 

468.12 

1.10 

.76 

1.23 

1.03 

21.13 


Converged  to  a  different  minimizer. 


Table  11.  Small  problems:  NPSOL  (1-40). 
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No. 

Problem 

lint. 

QP  It. 

Funs. 

Time 

Slat. 

41 

HS  107 

11 

27 

18 

.5055012e+04 

2.77 

Opt 

42 

Mukai-Polak 

10 

13 

16 

.5000000e+01 

1.08 

Opt 

43 

Penalty  1  a 

16 

77 

18 

.4313635e— 01 

20.01 

Opt 

44 

Penaltyl  c 

29 

152 

85 

.4313635e— 01 

24.35 

Opt 

45 

HS  32 

2 

3 

3 

.lOOOOOOe+Ol 

.25 

Opt 

46 

HS  46 

55 

56 

58 

.1936782e— 22 

5.26 

Opt 

47 

HS  51 

2 

o 

5 

.3851860e— 32 

.18 

Opt 

48 

HS  52 

2 

2 

5 

.5326648e+01 

.19 

Opt 

49 

HS  53 

2 

2 

5 

4093023e+01 

.19 

Opt 

50 

HS  13 

22 

13 

23 

.1002181e+01 

1.29 

Opt 

51 

HS  64 

29 

47 

39 

,6299842e+04 

2.34 

Opt 

52 

HS  70 

36 

39 

39 

.7498464e— 02 

3.33 

Opt 

53 

HS  71 

5 

9 

6 

.1701402e+02 

.53 

Opt 

54 

HS  74 

10 

14 

15 

.512649«e+04 

1.17 

Opt 

55 

HS  75 

6 

7 

10 

.5174413e+04 

.72 

Opt 

56 

HS  78 

10 

11 

14 

—  .2919700e+01 

1.15 

Opt 

57 

HS  80 

8 

8 

10 

.5394985e— 01 

.92 

Opt 

58 

HS  81 

14 

15 

20 

5394985e— 01 

1.57 

Opt 

59 

HS  84 

— 

— 

— 

— 

Fail 

60 

HS  85 

17 

33 

— ,1905155e+01 

4.00 

Opt 

61 

HS  86 

6 

11 

—  .3234868e+02 

.62 

Opt 

62 

HS  93 

12 

14 

,1350760e+03 

1.36 

Opt 

63 

HS  95 

1 

1 

.1561953e— 01 

.15 

Opt 

64 

HS  96 

1 

1 

.1561953e— 01 

.17 

Opt 

65 

HS  97 

3 

3 

,3135809e+01 

.40 

Opt 

66 

HS  98 

3 

8 

,3135809e+01 

.43 

Opt 

67 

HS  99 

23 

74 

44 

—  .8290102e+09 

3.99 

Opt 

68 

HS  100 

14 

18 

29 

.6806301e+03 

2.07 

Opt 

69 

HS  104 

18 

23 

20 

.3951 163e+01 

3.36 

Opt 

70 

HS  109 

11 

25 

13 

.5362069e+04 

3.23 

Opt 

71 

HS  111 

41 

44 

64 

—  ,4773239e+02 

8.08 

Opt 

72 

HS  112 

19 

54 

39 

—  .4776109e+02 

2.78 

Opt 

73 

HS  113 

14 

38 

19 

.2430621e+02 

3.12 

Opt 

74 

HS  114 

18 

36 

19 

—  .1768807e+04 

3.81 

Opt 

75 

HS  117 

17 

96 

21 

.3234868e+02 

6.75 

Opt 

76 

HS  118 

4 

20 

6 

.6648204e+03 

1.35 

Opt 

77 

HS  119 

12 

41 

16 

,2448997e+03 

4.25 

Opt 

78 

HS  83 

4 

4 

6 

,1012243e+05 

.54 

Opt 

79 

HS  106 

17 

30 

21 

.7049248e+04 

2.90 

Opt 

80 

Weapon 

96 

244 

98 

-  1735019e+04 

120.78 

Opt 

Table  12.  Small  problems:  NPSOL  (41-80). 
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No. 

Problem 

Itns. 

QP  It. 

Funs. 

*•<*’> 

viol. 

Time 

Slat. 

1 

Hexagon 

23 

56 

36 

-1.349963e+00 

5.13e— 13 

11.53 

Opt 

2 

HS  108 

7 

7 

10 

— 8.660254e— 01 

4.85e— 14 

1.88 

Opt 

3 

KS  372 

35 

205 

87 

1.339009e+04 

5.93e— 12 

26.26 

Opt 

4 

MHW  4 

15 

19 

28 

2.787187e+01 

2.84e— 11 

2.74 

Opt 

5 

MHW  9 

14 

19 

27 

-4.247468e+01 

5.42e— 12 

2.42 

Opt 

6 

HS  14 

8 

2 

14 

1.393465e+00 

3.21e— 12 

.81 

Opt 

7 

HS  26 

37 

36 

47 

7.028190e— 12 

5.55e— 17 

4.59 

Opt 

8 

HS  43 

13 

17 

17 

-4.400000e+01 

7. lie— 11 

2.16 

Opt 

9 

HS  65 

17 

25 

22 

9.535289e— 01 

2.06e— 13 

2.38 

Opt 

10 

KS  231 

15 

16 

44 

6.321700e— 21 

O.OOe-fOO 

1.51 

Opt 

11 

OPF  30  Bus 

30 

112 

40 

9.720420e— 01 

5.61e— 15 

178.95 

Opt 

12 

QP  problem 

11 

26 

16 

-1.847785e+06 

0.00e+00 

2.51 

Opt 

13 

LC7 

10 

24 

14 

9.295973e+05 

0.00e+00 

2.04 

Opt 

14 

Norway 

6 

20 

9 

-2.402344e+01 

0.00e+00 

1.66 

Opt 

15 

Singular 

20 

1 

23 

0.000000e+00 

7.28e— 12 

1.75 

Opt 

16 

Alan  Manne 

14 

35 

22 

-2.670099e+00 

8.98e— 14 

9.64 

Opt 

17 

Steinke2 

2 

14 

5 

4.142865e— 04 

1.02e— 07 

.77 

Opt 

18 

Square  root  1 

17 

2 

47 

2.499997e+03 

2.50e— 09 

4.19 

Opt 

19 

Square  root  2 

18 

0 

25 

2.999939e+00 

6.43e— 10 

3.49 

Opt 

20 

Square  root  3 

13 

12 

18 

2.000000e+00 

2.86e— 12 

3.37 

Opt 

21 

Square  root  4 

— 

— 

— 

— 

— 

— 

Itr 

22 

Boggs-Tolle  1 

11 

10 

18 

-1.000000e+00 

0.00e+00 

1.26 

Opt 

23 

Boggs- Tolle  2 

12 

12 

20 

3.256820e— 02 

5.80e— 14 

1.53 

Opt 

24 

Boggs-Tolle  3 

5 

SiM 

10 

7.957949e— 01 

0.00e+00 

.70 

Opt1 

25 

Boggs-Tolle  4 

8 

*|SI 

16 

-7.317428e+01 

1.83e— 13 

1.05 

Opt 

26 

Boggs-Tolle  5 

10 

15 

9.617152e+02 

1.77e— 11 

1.41 

Opt 

27 

Boggs-Tolle  6 

25 

34 

2.415051e— 01 

2.23e— 14 

3.75 

Opt 

28 

Boggs-Tolle  7 

— 

— 

— 

— 

— 

Itr 

29 

Boggs-Tolle  8 

21 

24 

1.000000e+00 

9.10e— 13 

2.41 

Opt 

30 

Boggs-Tolle  9 

45 

46 

87 

-1.000001e+00 

2.77e— 09 

6.84 

Opt 

31 

Boggs-Tolle  10 

8 

1 

10 

-1.000000e+00 

2.97e— 09 

.78 

Opt 

32 

Boggs-Tolle  11 

14 

16 

19 

9.171343e— 02 

2.78e— 12 

2.33 

Opt 

33 

Boggs-Tolle  12 

56 

162 

6.1881 19e+00 

7. lie— 13 

9.69 

Opt 

34 

Powell  triangles 

15 

34 

20 

2.331371e+01 

4.56e— 11 

4.02 

Opt 

35 

Powell  bad  scale 

15 

12 

54 

0.000000e+00 

0.00e+00 

1.85 

Opt 

36 

Powell  wriggle  Si 

139 

178 

283 

— 3.171038e— 13 

6.78e— 08 

20.05 

Opt 

37 

Powell  wriggle  S2 

47 

14 

99 

— 3.677403e— 07 

6.44e— 10 

5.12 

Opt 

38 

Powell- Maratos 

■a 

5 

10 

—  1.000000e+00 

5.55e— 17 

.83 

Opt 

39 

HS  72 

US 

1 

7 

7.266819e+02 

1.18e— 16 

.48 

Opt 

40 

HS  73 

1 _ L 

7 

9 

2.989438e+01 

4.44e— 16 

.79 

Opt 

1  Converged  to  a  different  minimizer. 


Table  13.  Small  problems:  (LSSQP-O)  Full  completion  (1-40). 


5-4  Conclusions 
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No. 

Problem 

Itns. 

QP  It. 

Funs. 

F(z') 

viol. 

Time 

Slat. 

41 

HS  107 

— 

— 

— 

— 

— 

— 

Itr 

42 

Mukai-Polak 

39 

51 

99 

5.000000e+00 

2.32e— 13 

6.65 

Opt 

43 

Penalty  1  a 

5 

100 

14 

4.961360e— 02 

0.00e+00 

14.57 

Opt1 

44 

Penalty  1  c 

5 

100 

14 

4.961360e— 02 

0.00e+00 

14.20 

Opt1 

45 

HS  32 

4 

6 

6 

1.000000e+00 

0.00e+00 

.61 

Opt 

46 

HS  46 

24 

26 

30 

6.666278e— 13 

2.16e— 15 

3.57 

Opt 

47 

HS  51 

5 

8 

10 

6.499273e— 02 

0.00e+00 

.75 

Opt 

48 

HS  52 

7 

13 

14 

1.372951e+00 

0.00e+00 

1.08 

Opt1 

49 

HS  53 

5 

7 

10 

7. 95794  9e— 01 

0.00e+00 

.75 

Opt 1 

50 

HS  13 

24 

22 

30 

9.995875e— 01 

8.78e— 12 

2.52 

Opt' 

51 

HS  64 

15 

21 

21 

6.299842e+03 

9.43e— 10 

2.04 

Opt 

52 

HS  70 

36 

39 

41 

7.498464e— 03 

0.00e+00 

6.01 

Opt 

53 

HS  71 

9 

13 

13 

1.701402e+01 

1.78e— 14 

1.43 

Opt 

54 

HS  74 

12 

13 

15 

5.126498e+03 

2.27e— 13 

2.15 

Opt 

55 

HS  75 

8 

10 

5.174413e+03 

2.30e— 10 

1.30 

Opt 

56 

HS  78 

10 

!W 

16 

-2.919700e+00 

3.08e— 13 

1.67 

Opt 

57 

HS  80 

11 

■a 

16 

5.394985e— 02 

0.00e+00 

1.78 

Opt 

58 

HS  81 

12 

12 

17 

5.394985e— 02 

2.22e— 16 

2.05 

Opt 

59 

HS  84 

4 

6 

-5.329025e+06 

0.00e+00 

.59 

Opt 

60 

HS  85 

17 

40 

—  1.905155e+00 

3.55e— 15 

17.33 

Opt 

61 

HS  86 

15 

38 

-3.234868e+01 

0.00e+00 

3.41 

Opt 

62 

HS  93 

26 

58 

1.350760e+02 

3.40e— 11 

5.77 

Opt 

63 

HS  95 

3 

1 

1.561953e— 02 

0.00e+00 

.36 

Opt 

64 

HS  96 

3 

1 

1.561953e— 02 

0.00e+00 

.38 

Opt 

65 

HS  97 

8 

23 

11 

3.135809e+00 

0.00e+00 

1.66 

Opt 

66 

HS  98 

8 

20 

11 

3.135809e+00 

0.00e+00 

1.55 

Opt 

67 

HS  99 

28 

49 

55 

-8.310799e+08 

2. lie— 07 

7.28 

Opt 

68 

HS  100 

20 

28 

31 

6.839810e+02 

1.12e— 09 

4.28 

Opt 

69 

HS  104 

22 

49 

34 

3.951 163e+00 

6.88e— 15 

6.53 

Opt 

70 

HS  109 

18 

56 

24 

5.362069e+03 

4.87e— 13 

7.13 

Opt 

71 

HS  111 

54 

65 

68 

— 4.776109e+01 

2.78e— 17 

11.99 

Opt 

72 

HS  112 

19 

62 

54 

— 4.776109e+01 

0.00e+00 

5.13 

Opt 

73 

HS  113 

60 

148 

107 

2.430621e+01 

1.23e— 08 

23.85 

Opt 

74 

HS  114 

30 

63 

91 

— 8.825283e+02 

5.68e— 14 

11.51 

Opt' 

75 

HS  117 

— 

— 

— 

— 

— 

— 

Itr 

76 

HS  118 

4 

30 

7 

6.648204e+02 

0.00e+00 

3.42 

Opt 

77 

HS  119 

18 

95 

40 

2.448997e+02 

0.00e+00 

11.44 

Opt 

78 

h3  83 

1 

7 

7 

1.012243e+04 

6.89e— 13 

1.08 

Opt 

79 

HS  106 

5 

10 

7 

2.100000e+03 

0.00e+00 

1.27 

Opt ' 

80 

Weapon 

2124 

393 

165 

—  1.735019e+03 

0.00e+00 

176.05 

Opt 

1  Converged  to  a  different  minimizer. 


Table  14.  Small  problems:  (LSSQP-O)  Full  completion  (41-80). 
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No. 

Problem 

QP  It. 

Funs. 

F(*') 

viol. 

Time 

Stat. 

1 

Hexagon 

33 

63 

43 

—  1.349963e+00 

6.73e— 12 

14.87 

Opt 

2 

HS  108 

15 

24 

22 

— 8.660254e— 01 

2.78e— 17 

5.18 

Opt 

3 

KS  372 

23 

67 

38 

1.339009e+04 

9.98e— 09 

11.24 

Opt 

4 

MHW  4 

42 

18 

84 

2.787187e+01 

1.24e— 12 

6.00 

Opt 

5 

MHW  9 

20 

21 

37 

— 4.247468e+01 

1.31e— 13 

3.32 

Opt 

6 

HS  14 

7 

2 

11 

1.393465e+00 

8.60e— 12 

.75 

Opt 

7 

HS  26 

31 

29 

39 

7.717254e— 10 

2.22e— 16 

3.49 

Opt 

8 

HS  43 

18 

21 

28 

— 4.400000e+01 

1.23e— 10 

3.02 

Opt 

9 

HS  65 

21 

22 

33 

9.535288e— 01 

0.00e+00 

2.55 

Opt 

10 

I<S  231 

15 

14 

44 

6.321700e— 21 

0.00e+00 

1.18 

Opt 

11 

OPF  30  BUS 

81 

144 

129 

9.720420e— 01 

1.33e— 08 

384.83 

Opt 

12 

QP  problem 

5 

9 

9 

— 7.750787e+05 

0.00e+00 

.91 

Opt1 

13 

LC7 

12 

19 

15 

7.790963e+05 

0.00e+00 

1.69 

Opt1 

14 

Norway 

6 

13 

8 

-4.086420e+00 

0.00e+00 

1.10 

Opt 1 

15 

Singular 

20 

1 

23 

0.000000e+00 

7.28e— 12 

1.75 

Opt 

16 

Alan  Manne 

24 

46 

29 

-2.670099e+00 

5.62e— 14 

16.72 

Opt 

17 

Steinke2 

7 

14 

14 

4.000131e— 04 

8. Ole— 16 

1.56 

Opt 

18 

Square  root  1 

2 

28 

2.500000e+03 

4.90e— 13 

3.38 

Opt 

19 

Square  root  2 

HI 

HI 

29 

2.999878e+00 

2.27e— 10 

3.22 

Opt 

20 

Square  root  3 

■a 

5 

2.000000e+00 

0.00e+00 

.48 

Opt 

21 

Square  root  4 

■i 

— 

— 

— 

— 

Jtr 

22 

Boggs-Tolle  1 

15 

8 

27 

—  l.OOOOOOe+OO 

2.87e+00 

1.60 

Opt 

23 

Boggs-Tolle  2 

14 

12 

23 

3.256820e— 02 

1.31e— 14 

1.63 

Opt 

24 

Boggs-Tolle  3 

6 

8 

10 

8.117684e— 01 

0.00e+00 

.63 

Opt1 

25 

Boggs-Tolle  4 

9 

6 

13 

— 7.317428e+01 

1.78e— 15 

1.11 

Opt 1 

26 

Boggs-Tolle  5 

11 

8 

16 

9.617152e+02 

0.00e+00 

1.44 

Opt1 

27 

Boggs-Tolle  6 

29 

28 

50 

2.415051e— 01 

1.94e— 12 

4.09 

Opt 

28 

Boggs-Tolle  7 

— 

— 

— 

— 

— 

— 

Itr 

29 

Boggs-Tolle  8 

21 

3 

24 

1.000000e+00 

9.10e— 13 

2.30 

Opt 

30 

Boggs-Tolle  9 

— 

— 

— 

— 

— 

— 

Itr 

31 

Boggs-Tolle  10 

8 

1 

12 

-1.000000e+00 

2.97e— 09 

.79 

Opt 

32 

Boggs-Tolle  11 

15 

16 

22 

9.171343e— 02 

1. lie— 16 

2.34 

Opt 

33 

Boggs-Tolle  12 

70 

56 

275 

6.188119e+00 

2.64e— 11 

11.08 

Opt 

34 

Powell  triangles 

■if 

— 

— 

— 

— 

Itr 

35 

Powell  bad  scale 

— 

— 

— 

— 

Itr 

36 

Powell  wriggle  Si 

— 

— 

— 

— 

Itr 

37 

Powell  wriggle  S2 

9 

— 

— 

— 

— 

Itr 

38 

Powell-Maratos 

9 

10 

— 1.000000e+00 

1.57e— 14 

.74 

Opt 

39 

HS  72 

i 

7 

7.266819e+02 

1.18e— 16 

.47 

Opt 

40 

HS  73 

L 

7 

8 

2.989438e+01 

0.00e+00 

.75 

Opt 

Converged  to  a  different  minimizer. 


Table  15.  Small  problems:  (LSSQP-E)  Early  termination  (1-40). 
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No. 

Problem 

Itns. 

QP  It. 

Funs. 

F(x') 

viol. 

Time 

Siat. 

41 

HS  107 

— 

— 

— 

— 

— 

— 

Itr 

42 

Mukai-Polak 

17 

20 

49 

5.000000e+00 

7.79e— 07 

2.63 

Opt 

43 

Penalty  1  a 

4 

50 

15 

4.961363e— 02 

0.00e+00 

4.93 

Opt1 

44 

Penaltyl  c 

4 

50 

15 

4.961363e— 02 

0.00e+00 

4.64 

Opt 1 

45 

HS  32 

4 

6 

6 

1.000000e+00 

0.00e+00 

.66 

Opt 

46 

HS  46 

40 

41 

49 

3.370541e— 11 

4.72e— 14 

5.75 

Opt 

47 

HS  51 

5 

8 

10 

6.499273e— 02 

0.00e+00 

.72 

Opt1 

48 

HS  52 

14 

21 

50 

7.222497e— 01 

0.00e+00 

1.96 

Opt 1 

49 

HS  53 

5 

10 

8.117684e— 01 

0.00e+00 

.67 

Opt 1 

50 

HS  13 

24 

30 

9.995875e— 01 

8.78e— 12 

2.30 

Opt 

51 

HS  64 

18 

23 

33 

6.299842e+03 

4.00e— 08 

2.32 

Opt 

52 

HS  70 

37 

35 

41 

7.498464e— 03 

0.00e+00 

5.82 

Opt 

53 

HS  71 

15 

17 

19 

1.701402e+01 

1.65e— 12 

2.08 

Opt 

54 

HS  74 

12 

12 

15 

5.126498e+03 

2.27e— 13 

1.98 

Opt 

55 

HS  75 

8 

7 

11 

5.174413e+03 

2.30e— 10 

1.28 

Opt 

56 

HS  78 

— 

— 

— 

— 

— 

— 

Itr 

57 

HS  80 

12 

10 

18 

5.394985e— 02 

0.00e+00 

1.84 

Opt 

58 

HS  81 

11 

9 

15 

5.394985e— 02 

1. lie— 16 

1.63 

Opt 

59 

HS  84 

6 

5 

8 

— 5.329025e+06 

1.46e— 11 

.75 

Opt 

60 

HS  85 

— 

— 

— 

— 

— 

— 

Itr 

61 

HS  86 

5 

9 

9 

-3.234868e+01 

0.00e+00 

.90 

Opt 

62 

HS  93 

18 

26 

25 

1.350760e+02 

8.67e— 14 

3.10 

Opt 

63 

HS  95 

5 

7 

1.561953e— 02 

0.00e+00 

.70 

Opt 

64 

HS  96 

5 

7 

1.561953e— 02 

0.00e+00 

.73 

Opt 

65 

HS  97 

15 

7 

3.135809e+00 

0.00e+00 

1.08 

Opt 

66 

HS  98 

— 

— 

— 

— 

— 

Itr 

67 

HS  99 

— 

— 

— 

— 

— 

Itr 

68 

HS  100 

23 

25 

33 

6.839810e+02 

3.20e— 11 

4.17 

Opt 

69 

HS  104 

26 

48 

56 

3.951 163e+00 

1.21e— 17 

6.92 

Opt 

70 

HS  109 

13 

44 

18 

5.362287e+03 

7.35e— 13 

5.07 

Opt 

71 

HS  111 

68 

55 

88 

— 4.776109e+01 

6.39e— 08 

12.58 

Opt 

72 

HS  112 

24 

60 

74 

—  4.776109e+01 

0.00e+00 

5.06 

Opt 

73 

HS  113 

53 

64 

65 

2.430621e+01 

8.40e— 09 

14.63 

Opt 

74 

HS  114 

— 

— 

— 

— 

— 

— 

Cbt 

75 

HS  117 

31 

193 

304 

1.325514e+03 

0.00e+00 

27.38 

Opt 1 

76 

HS  118 

10 

37 

16 

6.648204e+02 

0.00e+00 

4.36 

Opt 

77 

HS  119 

13 

44 

25 

2.448997e+02 

0.00e+00 

5.48 

Opt 

78 

HS  83 

9 

9 

12 

1.012243e+04 

3.36e— 08 

1.95 

Opt 

79 

HS  106 

7 

7 

9 

2.100000e+03 

0.00e+00 

1.32 

Opt' 

80 

Weapon 

129 

410 

175 

-1. 73501 9e+03 

0.00e+00 

194.84 

Opt 

Converged  to  a  different  minimizer. 


Table  16.  Small  problems:  (LSSQP-E)  Early  termination  (41-80). 


Appendix  A 

Nonlinear  Programming  for 
Trajectory  Optimization 


A.l.  Trajectory  optimization 

Despite  the  empirical  sucess  of  optimization  implementations  such  as  MINOS  and  NPSOL, 
we  can  identify  problems  for  which  improved  performance  is  desirable.  A  class  of  problems 
that  we  feel  will  benefit  from  large-scale  SQP  methods  is  in  the  area  of  trajectory  optimiza¬ 
tion.  In  general  these  mathematical  programming  problems  are  characterized  by  matrices 
that  are  large  and  sparse  and  have  functions  that  are  expensive  to  evaluate.  An  example 
of  a  trajectory  optimization  problem  is  the  Supersonic  Interceptor  Minimum-Time  Climb 
(SIMTC)  problem  [Bry69j.  The  problem  statement  is: 

Find  the  path  taking  a  supersonic  interceptor  from  sea  level  and  Mach  0.34  to 
an  altitude  of  20km  and  Mach  1.0  in  minimum  time. 

Sample  graphs  of  the  optimal  altitude  and  thrust  profiles  (plotted  against  elapsed  time 
of  flight)  for  a  Phantom  F4  are  given  in  Figure  1.  Note  the  non-intuitive  shape  of  the 
optimal  trajectory. 

A. 2.  Problem  statement 

Trajectory  optimization  leads  to  problems  in  optimal  control.  The  goal  is  to  minimize  a 
specified  performance  index  F.  For  the  SIMTC  problem  in  Section  A.l  the  performance 
index  is  the  time  required  to  reach  a  specific  altitude  and  speed.  Other  possibilities  for 
F  are  the  amount  of  fuel  burned  or  the  time  to  reach  a  specific  destination.  Trajectory 
optimization  problems  are  described  in  terms  of  a  sequence  of  N  time  stages  with  time 
points  E ,  (called  events)  delimiting  the  stages.  For  the  general  formulation  we  write  F  as 

F(.r(F),  u(E),u.\  E).  (A.2.1) 

The  performance  index  F  is  a  function  of 

•  A  vector  of  states  x,  governed  by  first-order  differential  equations  (see  below), 

•  Control  functions  u(t)  (e.g.  pitch  angle), 
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Figure  1.  Altitude  and  thrust  profiles  for  the  Phantom-F4  SIMTC  problem 

•  Vehicle  design  parameters  u  (e.g.  rocket  nozzle  diameter), 

•  Time  points  £,,  i  =  1, . . . ,  N  +  1,  delimiting  the  stages. 

The  »-th  stage  is  a  dynamical  system  restricted  by  differential  constraints  (called  state 
equations)  of  the  form 

i  dx 

Xi  =  —  =  Mx,u,u,t)  1€[Ei,Ei+ 1],  (A.2.2) 

for  i  =  These  constraints  correspond  to  differential  equations  of  motion.  The 

variables  must  also  satisfy  nonlinear  initial  and  terminal  conditions  a,  at  each  stage: 

l i  <  «,(*(£,),  (£,),u;)  <  uf.  (A.2.3) 

In  addition,  path  constraints  /i,  may  be  imposed  on  the  system  at  each  stage 

/"  <  /»,(*,  u. u;J.)  <  uf.  (A.2.4) 

The  functions  /,,  a,  and  A*  are  assumed  to  be  twice  continuously  differentiable  within 
each  stage.  However,  the  functions  are  allowed  to  be  discontinuous  between  events.  That 
is,  at  event  boundaries,  discontinuities  of  the  form 

*(£.+ i  )  =  x(Ei)  +  <7,  (A.2.5) 

are  allowed.  These  allow  the  modelling  of  characteristics  such  as  the  jettison  of  a  payload 
or  a  modification  of  velocity.  The  <r,'s  may  be  fixed  or  included  in  the  design  parameter 
set  u. 

Hargraves  and  Paris  [HarPST]  presented  a  direct  trajectory  optimization  method  of 
the  form  ( A. 2.1 )— ( A.2.5)  that  represents  state  and  control  variables  bv  piecewise  polyno¬ 
mials.  This  method  has  been  developed  into  OTIS,  a  system  for  trajectory  optimization 
(HarP88j.  Specifically,  Hargraves  and  Paris  transformed  the  optimal  control  problem  into 
a  mathematical  programming  problem  by  using  an  implicit  integration  scheme  known 
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as  collocation  (see  [Enr91])  to  satisfy  equations  (A. 2.2).  This  method  of  transforming 
the  optimal  control  problem  into  a  mathematical  programming  problem  is  called  direct 
transcription. 

A  complete  description  of  the  transcription  method  used  to  approximate  (A.2.2)  can 
be  found  in  [Enr91]  or  [HarP87].  The  method  is  summarized  below: 

1.  Each  stage  [£,,£,+ 1]  is  partitioned  into  a  set  of  M  smaller  segments. 

2.  A  cubic  spline  is  fitted  for  each  segment.  The  data  for  the  fit  is  taken  from  the  values 
of  ^  at  the  mesh  points  of  each  segment. 

3.  A  numerical  integration  scheme  is  used  to  approximate  x(t)  at  the  midpoint  of  each 
segment. 

4.  Variables  6i  (called  defects)  are  defined  as  the  difference  between  the  approximation 
and  the  true  value  at  the  midpoint. 

If  the  defects  can  be  driven  to  zero,  the  cubic  spline  will  provide  an  accurate  approximation 
to  (A.2.2).  As  a  result  of  this  transcription  process,  the  optimal  control  constraints  (A.2.2) 
can  be  replaced  in  the  formulation  by  equality  constraints  of  the  form  6j  =  0  for  i  — 
1, . . . ,  M,  for  each  of  the  N  events  for  the  problem. 

A. 2.1.  Problem  formulation 

The  mathematical  programming  problem  now  includes  terms  for  the  defects  of  the  in¬ 
terpolation  method  in  place  of  equations  (A.2.2).  In  addition,  the  boundary  conditions 
(A. 2.3)  are  enforced  and  the  nonlinear  path  constraints  (A.2.4)  are  enforced  at  the  grid 
points.  The  transcribed  trajectory  optimization  problem  may  be  written  as  the  following 
mathematical  program: 


minimize 

s.t. 

F(: r,  u.  E,u) 

' 1 , 

_ 

0, 

i=l,., 

..,A, 

it  < 

flI(.r(£i).(£'i),w) 

< 

uf. 

t  =  1,. , 

..,A, 

^  < 

hi(x,u,u;J) 

< 

*  =  1 ,  - 

x{E,+ 1)  -  x (Ei)  -  cr, 

= 

0, 

i  =  1,.. 

...A, 

lB  < 

(*,  M,  E,U>) 

< 

where  rf;  is  a  vector  of  center  defects  for  stage  i  ( d,  =  [5,i, . . . ,  £,a/]). 
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